Generally I currently rely on GPT-5 Thinking (Medium Reasoning) for most tasks because of all the models currently available, the GPT-5 series (and GPT-4.1 before that) have been the most reliable in following instructions to the letter, doing no less, but more importantly, no more.
Both Claude models (from 3.5 Sonnet to 4.1 Opus) and Gemini 2.5 Pro have historically always taken a lot more liberties, which some users find appealing, but which I have come to not want when relying on a model for consistent output. I can see why some find great value in a model already implementing e.g. an auth provider when requesting the frontend for a login page, but for guiding a model, I personally prefer something to not happen if I didn't explicitly requested such behavior. Especially Claude as part of agentic coding workflows has a tendency to simply try e.g. a different package then what was requested, which some users may not notice. Found this very funny when Claude 4 Sonnet once fully reimplemented an infinite canvas as @xyflow failed to install properly. I'd rather a model error out there/ask for the user in the loop to confirm.
In regard to instruction following, while all three Frontier providers do well with their context windows, GPT-5 models are still a bit more preferable for me, despite having only 400k vs 1million, simply because what is there can not just be recalled, but will be adhered to reliably as well.
GPT-5 also seems a bit better regarding CSS, though I have far to limited UI taste to actually make a solid judgement on that front and styling is of course subjective.
Additionally, when benchmarking all three frontier models side by side, I have yet to find a coding task that GPT-5 cannot solve but the others can. I did however find certain cases where my initial instructions were lacking/not comprehensive enough, leading to all three having issues completing a task. In those cases, I found that Gemini 2.5 Pro when provided with the code base does best at rewriting an existing prompt. These then usually have far higher success rates when provided to GPT-5 Thinking (Medium Reasoning) or to a lesser extend when using one of the Claude 4 models. However, these Gemini provided prompts also occasionally contain inventions/hallucinations", so I must always triple check prompts when doing this.
For context, the main coding problem I am using model assistance for at the moment is some poorly designed Figma inspired real time syncing code with some overly odd edge cases, courtesy of my limited skillset.
For none-coding stuff, I have in the last semester mainly relied on Gemini 2.5 (sometimes Pro, often Flash) for creating nice summaries of lectures. I found any other models (doesn't matter whether OpenAIs previous models or anything from Anthropic, Mistral, Deepseek, Qwen, etc.) less suitable, mainly because these tended to output far to strong summaries, often truncating what is absolutely vital information. Gemini models are far more willing to actually output an extensive, maybe a bit to verbose summary, but I'd rather remove a few lines. I haven't yet gotten enough experience with GPT-5 as a summarization tool as the semester is only just starting, so cannot say how well OpenAIs newest series does there, but from very limited experience, GPT-5-mini has potential here to replace Gemini 2.5 Flash as my go to.
GPT5-Thinking if I need a precise answer with the least possible amount of mistakes.
GPT5-Pro is the real deal.
Gemini if I need creative insights and a pleasant talk, but this comes at the cost of more mistakes (and it's hella stubborn).
Hopefully Gemini 3.0 will fix this.
Generally I currently rely on GPT-5 Thinking (Medium Reasoning) for most tasks because of all the models currently available, the GPT-5 series (and GPT-4.1 before that) have been the most reliable in following instructions to the letter, doing no less, but more importantly, no more.
Both Claude models (from 3.5 Sonnet to 4.1 Opus) and Gemini 2.5 Pro have historically always taken a lot more liberties, which some users find appealing, but which I have come to not want when relying on a model for consistent output. I can see why some find great value in a model already implementing e.g. an auth provider when requesting the frontend for a login page, but for guiding a model, I personally prefer something to not happen if I didn't explicitly requested such behavior. Especially Claude as part of agentic coding workflows has a tendency to simply try e.g. a different package then what was requested, which some users may not notice. Found this very funny when Claude 4 Sonnet once fully reimplemented an infinite canvas as @xyflow failed to install properly. I'd rather a model error out there/ask for the user in the loop to confirm.
In regard to instruction following, while all three Frontier providers do well with their context windows, GPT-5 models are still a bit more preferable for me, despite having only 400k vs 1million, simply because what is there can not just be recalled, but will be adhered to reliably as well.
GPT-5 also seems a bit better regarding CSS, though I have far to limited UI taste to actually make a solid judgement on that front and styling is of course subjective.
Additionally, when benchmarking all three frontier models side by side, I have yet to find a coding task that GPT-5 cannot solve but the others can. I did however find certain cases where my initial instructions were lacking/not comprehensive enough, leading to all three having issues completing a task. In those cases, I found that Gemini 2.5 Pro when provided with the code base does best at rewriting an existing prompt. These then usually have far higher success rates when provided to GPT-5 Thinking (Medium Reasoning) or to a lesser extend when using one of the Claude 4 models. However, these Gemini provided prompts also occasionally contain inventions/hallucinations", so I must always triple check prompts when doing this.
For context, the main coding problem I am using model assistance for at the moment is some poorly designed Figma inspired real time syncing code with some overly odd edge cases, courtesy of my limited skillset.
For none-coding stuff, I have in the last semester mainly relied on Gemini 2.5 (sometimes Pro, often Flash) for creating nice summaries of lectures. I found any other models (doesn't matter whether OpenAIs previous models or anything from Anthropic, Mistral, Deepseek, Qwen, etc.) less suitable, mainly because these tended to output far to strong summaries, often truncating what is absolutely vital information. Gemini models are far more willing to actually output an extensive, maybe a bit to verbose summary, but I'd rather remove a few lines. I haven't yet gotten enough experience with GPT-5 as a summarization tool as the semester is only just starting, so cannot say how well OpenAIs newest series does there, but from very limited experience, GPT-5-mini has potential here to replace Gemini 2.5 Flash as my go to.
[dead]