Chat Icon
Build Your Company. We’ll Build Your Software. Let’s Talk
Right arrow

A Builders’ Guide to GPT-4o and Gemini. Which to Choose?

June 6, 2024

Ragul Kachiappan
Software Engineer

A Builders’ Guide to GPT-4o and Gemini. Which to Choose?

GPT-4o (“o” for “omni”) from OpenAI, the Gemini family of models from Google, and the Claude family of models from Anthropic are the state-of-the-art large language models (LLMs) models that are currently available in the Generative Artificial Intelligence space. GPT-4o was released recently from OpenAI while Google announced the Gemini 1.5 models in early February of 2024.

GPT-4o is multimodal in that it can accept any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs. GPT-4o offers significant performance improvements over its predecessor "GPT-4-turbo" by being at least 30% faster and 50% cheaper, allowing for its usage in more practical, production-grade use cases.

Meanwhile, Gemini currently offers 4 model variants,

  • Gemini 1.5 Pro - Optimized for complex reasoning tasks like code generation, problem-solving, data extraction, and generation.
  • Gemini 1.5 Flash - Fast and versatile performance across a diverse variety of tasks.
  • Gemini 1.0 Pro - Supports common Natural language tasks, multi-turn text and code chat, and code generation.
  • Gemini 1.0 Pro Vision - Curated for visual-related tasks, like generating image descriptions or identifying objects in images.

At an AI services and custom software development company like Techjays, we plow with these tools on a daily basis and even the nitty gritties matter in our processes.


                            Source: OpenAI

Common benchmarks used to evaluate large language models (LLMs) assess a wide range of capabilities, including multitasking language understanding, answering graduate-level technical questions, mathematical reasoning, code generation, multilingual performance, and arithmetic problem-solving abilities. In most of these evaluation benchmarks, OpenAI's GPT-4o has demonstrated superior performance compared to the various Gemini model variants from Google, solidifying its position as the overall best model in terms of the quality of its outputs.

LLMs that can take larger input contexts are vulnerable to forgetting specific pieces of information from the input while responding. This can cause significant performance degradation in tasks involving multi-document question answering or when the model has to retrieve information that is present in the middle of long contexts. Needle in a Needlestack is a new benchmark designed to address this concern and measure how well LLMs pay attention to the information in their context window.

Image source - Needlestack
Images: Comparison of information retrieval performance between GPT-4-turbo, GPT-4o, and Gemini-1.5-pro relative to the token position of the input content.

GPT-4-turbo performance degrades significantly when the relevant information is present in the middle of the input context. GPT-4o provides much better results in this metric allowing for longer input contexts. However, GPT-4o failed to match the overall consistency of Gemini-1.5-pro making it the ideal choice for tasks requiring larger inputs.

API Access:

Both GPT-4o and Gemini model variants are available through API access and would require an API key to use the models. 

OpenAI provides official client SDKs in Python and NodeJS. Besides the official libraries, there are community-maintained libraries for all the popular languages like C#/.NET, C++, Java, and so on. One could also make direct HTTP requests for model access. Refer to the OpenAI API (documentation) for more information.

Google provides Gemini access through (Google AI Studio) and API access with client SDK libraries in popular languages like Python, JavaScript, Go, Dart, and Swift. Refer to the official Gemini (documentation)  for further information.

In-Depth Model Specifications:

Gemini models with 1 million context window limits have double the rate for inputs with context lengths greater than 128k.

Source: OpenAI pricing

            Gemini pricing

Feature Comparison:

  1. Context Caching: GoogleI offers context caching features for the Gemini 1.5 Pro variant to reduce the cost when consecutive API usage contains repeat content with high input token counts. This feature is well suited when we need to provide common context like extensive system instructions for a chatbot that would be applicable for many consecutive API requests. OpenAI as of now doesn’t have support for this feature with GPT-4o or other GPT model variants.
  2. Batch API: This feature is useful in scenarios where we have to process a group of inputs like running test cases with LLM and we don’t require an immediate response from the LLM. OpenAI is currently offering Batch API to send asynchronous groups of requests with 50% lower costs, higher rate limits, and a 24-hour time window within which we can get the results. This feature is particularly useful in saving cost in the development phase of Gen AI applications which would involve rigorous testing and in scenarios where we don’t require an immediate response. Google is not offering Gemini under the same Batch API features but batch predictions are available as a Beta feature in Google Cloud Vertex AI to process multiple inputs simultaneously.
  3. Speed/Throughput Comparison: The speed of a LLM model is quantified by tokens/per second received while the model is generating tokens. Gemini 1.5 Flash is reported to be the best model out of all popular LLMs in terms of tokens/per second. GPT-4o is nearly 2 times faster than its predecessor GPT-4-turbo in terms of inference speed but it still falls significantly behind the Gemini 1.5 Flash. However, GPT-4o is still faster than the advanced Gemini variant Gemini 1.5 Pro. Gemini’s 1M token context window also allows for longer inputs which will impact the speed.

Nature of Responses from GPT-4o and Gemini:

  • Gemini has been recognized for its ability to make responses sound more human compared to GPT-4o. This, along with its ability to create draft response versions in the Gemini App makes it suitable for creative writing tasks such as marketing content, sales pitch, writing essays, articles, and stories.
  • GPT-4o responses are a bit more monotonic, but its consistency in response to analytical questions has proven to be better, making it ideal for deterministic tasks such as code generation, problem-solving, and so on.
  • Furthermore, Google has recently faced some public backlash regarding the restrictiveness of responses from Gemini. A recent thread on Hacker News raised concerns that Gemini was refusing to answer questions related to C++ language as it is deemed unsafe for under-18-aged users.  Google had to face another incident regarding Gemini’s image generation where Gemini was generating historically inaccurate images when prompted with queries about the historical depiction of certain groups. Google temporarily paused the feature after issuing a statement acknowledging the inaccuracies.
  • Both GPT-4o and Gemini have sufficient safeguards to protect against malicious actors trying to get responses regarding extreme content. However, this has raised concerns about the models being too restrictive and inherently biased towards certain political factions where they decline to respond to one group in the political spectrum while answering freely for other groups.
  • OpenAI faced allegations that GPT-4 had become “lazy” shortly after the introduction of GPT-4-Turbo back in November 2023. The accusations were mostly centered around GPT-4’s inability to follow complete instructions. It is believed that this laziness is mainly attributed to GPT forgetting instructions that are placed in the middle of the prompt. However, with GPT-4o exhibiting better performance in the Needle in a NeedleStack benchmark, GPT-4o is now better at following all the instructions.
  • Based on the nature and quality of answers produced by GPT-4o and Gemini, below given are the opinionated preferences between GPT-4o and Gemini for various use cases.

RAG vs Gemini’s 1M Long Context Window:

Retrieval Augmented Generation or RAG for short is the process through which we can provide relevant external knowledge context as input to answer a user’s question. This technique is effective when the inherent knowledge of LLM is insufficient to provide an accurate answer. RAG is crucial for building custom LLM-based chatbots for domain-specific knowledge bases such as internal company documents, brochures, and so on. It also aids in improving the accuracy of answers and reduces the likelihood of hallucinations. For example, take an LLM-based chatbot that can provide answers from internal company documents. Given the limited context window of LLMs, it is difficult to pass the entire documents as context to the LLM. The RAG pipeline allows us to filter out document chunks that are relevant to user questions using NLP techniques and pass them as context. 

The 1M context window of Gemini allows for the possibility of passing large documents as context without the use of RAG. Moreover, this approach could provide better performance if the retrieval performance of RAG is poor for the given set of documents. There’s also an expectation that as the LLM capabilities improve over time, the context windows and latency would also improve proportionally negating the need for RAG.

While the longer context window makes a compelling case over RAG, it comes with a significant increase in cost per request and is wasteful in terms of compute usage. Increased latency and performance degradation due to context pollution would make it challenging to adopt this approach. Despite the expectation of context windows getting larger over time and the fallible nature of NLP techniques employed by RAG, RAG is still the optimal and scalable approach for a large corpus of external knowledge. 

Rate Limits:

Given the high compute nature of LLM inference, rate limits are set in place on both Gemini and GPT-4o. Rate limits are intended to avoid misuse by malicious actors and to ensure uninterrupted service to all active users.

  • OpenAI follows a tier-based rate limit approach. The free tier sets rate limits for GPT-3.5-turbo and text embedding models. There are five tiers placed above the free tier from Tier 1 to Tier 5. Users will be bumped to higher tiers with better rate limits as their usage of the API increases. So Tier 5 users will have the best rate limits to accommodate for their high usage needs. Refer to the usage tiers documentation from OpenAI for detailed information on Tier limits. Below given are the rate limits for GPT-4o.

  • Google, on the other hand, provides Gemini in two modes: Free of Charge and Pay-as-you-go. Refer to the pricing for up-to-date information on the rate limits. Below are the detailed rate limits for Gemini model variants

RPM - Requests Per Minute, RPD - Requests Per Day, TPM - Tokens Per Minute


All in all, GPT-4o offers the best and most consistent performance in terms of question answering capabilities whereas Gemini offers a wide range of features like longer context windows, context caching, and faster mini-model variants that are more capable than its equivalent offering from OpenAI like GPT-3.5-turbo. Gemini also offers a generous free tier limit in its API access meanwhile OpenAI made GPT-4o available even for free tier users in ChatGPT. 

For those looking to invest in AI, the choice between GPT-4o and Gemini will ultimately come down to the problem requirements and cost-benefit analysis in your AI services journey. For problems or projects that have heavy requirements for analysis, mathematical reasoning, and code generation, GPT-4o seems to be the best option with Gemini 1.5 Pro falling close by. For AI services tasks that require a good level of creativity like story writing, Gemini model variants seem to have inherent qualities that make them well-suited for such creative endeavors. Some tasks will require longer context windows like Document Question Answering, and processes that involve a high number of steps. When it comes to these kinds of tasks, Gemini emerges as the most suitable choice, offering an impressive 1M input context limit and superior information retrieval capabilities that surpass those of GPT-4o.

Related Posts

Build Your Company.

We’ll Build Your Software.

Let’s Work Together