Overview
We set out to test the leading AI models’ knowledge (ChatGPT, Claude, Gemini, Grok) to see which model was the best for answering a wide array of questions. The tests were done on all of premium versions of the chatbots. Initially, we asked fact-based questions and thought-provoking riddles, and found all models are equally equipped to correctly answer prompts (with a few exceptions, hallucinations, and glitches).
Our test expanded to questions of unknowns and opinions. Additionally, we pushed the models to give us one, concise answer, as opposed to a long-winded answer including considerations and options.
Our Goal: To see which models provide the best responses to Subjective, Indirect, and Future Based prompts.
Our Approach: We tested the four leading generative AI models; OpenAI’s ChatGPT, Anthropic’s Claude, xAI’s Grok, and Google’s Gemini. The same 40 questions were prompted across 4 categories: Commerce, Future Based, Consulting, and Detail Extraction. The objective was to get the model to give a definitive answer to the prompt to score full points. Multiple prompt attempts and framing strategies are used to get an acceptable response.
Methodology: Our test included prompting the models across four query types:
- Commerce: looking for the best purchase option based on specific parameters.
- Future Based: looking for a single forward-looking answer based on relevant information.
- Consulting: how-to instructions based on specific scenarios.
- Detail Extraction: asking for specific details that need to be extracted from a larger data set or item that aren’t readily available.
Criteria: Responses were graded on the following criteria for a maximum score of 10 per query: