Claude 3.5 Sonnet vs GPT-4o: Side-by-Side Tests

106,765

2,326 0

2024-06-28に共有

The ultimate showdown between two of the most advanced large language models on the market: OpenAI's GPT-4o and Anthropic's Claude 3.5 Sonnet. In this video, I put these models to the test in a series of head-to-head challenges to determine which one truly reigns supreme. I evaluate their responses to various prompts, awarding points to the model that delivers the best performance in each category. Will Claude 3.5 Sonnet live up to its reputation as the best LLM available, or will GPT-4o take the crown? Join me for an in-depth comparison and find out which model comes out on top!

I hope you learn something from this video. Comment with any questions, and I'll make sure to respond!

***

Link to text responses from the video: gist.github.com/patrickstorm/346e17f193ae42036f890…

***

0:00 - Intro
0:27 - Highlights and Benchmarks of Claude 3.5 Sonnet
3:12 - Showdown rules
3:58 - Round 1: Creative Writing
6:55 - Round 2: Image Descriptions
9:09 - Round 3: Coding
15:31 - Round 4: Sentiment Analysis
17:05 - Round 5: Question Answering
20:45 - Round 6: Image Generation
21:07 - Round 7: Conversational Skills
22:26 - Round 8: Summarization
23:53 - Final results & What model am I going to use?

コメント (21)

@AnthonyGoubard 21日前

I think given no point when both are correct may bias the final result. Let's say, you've done 20 tests, 15 are the same results, 1 gpt4o is better, 4 Claude Sonnet is better. The score is then 4-1 for Clause Sonnet but actually it's more 19-16.
@user-nl6dg2mp8p 21日前

Summary: 1. both are great. 2. don't use either for fact finding. 3. Since they are both free, use both simultaneously.
@Ivan7Kovnovic 21日前

The GDP 2018 question was actually answered correctly. According to every source I found on the internet, Germany was 4th and the UK was 5th.
@drlordbasil 21日前

Claude Sonnet is wayyy better for complex tasks and assistance in debugging.
@prithviraj1080 11日前

Well researched. You document many of the use cases I need and use. An excellent video.
@suleymanbolek7296 21日前

This was the best comparison video on YouTube. Great job man, subscribed.
@RoseAlternative 14日前

Thanks for putting in the time and effort to make this video! I was wondering if I should renew my GPT-4o, or try Claude for the first time. Now I'm set on trying Claude. The video quality is amazing, keep up the good work! :)
@ktms1188 21日前

Claude 3.5 and GPT-4o both have their strengths, and it’s fascinating to see how they differ. Claude feels more human, like it’s really trying to understand what I’m asking, but then I’ve noticed with the memories function in GPT the model I think knows a lot more when I’m trying to ask now so now has much better answers like Claude 3.5. My issue is sometimes it hits those frustrating blocks and says it’s unable to answer my question, which drives me nuts even when it’s nothing controversial and it clearly would know the answer. I noticed in one of their talking points that is one of their big things. They are working on as it is overly restrictive and they know it so improve that. GPT-4o, on the other hand, is super analytical but occasionally needs me to rephrase my questions to get the best answers. I’ve been using both for a while now, and here’s what I’ve found: Claude’s artifact mode is mind-blowing, it’s nice if you’re on an iPhone or iPad since no android app. GPT’s memory function is a game-changer, making it more accurate over time as it learns from our interactions. Wouldn’t it be amazing if they combined the best of both worlds? I’d love to see a deep dive comparison between custom GPTs like “Scholar” and the standard GPT-4o, especially for fact-based questions. Does the customization really boost accuracy?
@briankgarland 21日前

I pay for both, primarily for coding, and haven't used 4o since Sonnet came out.
@andrewslabbert4316 7日前

I've watched a lot of AI videos out there, this one was truly helpful. You've gained my subscribe & my full attention Patrick! Thank you!
@vm_jayfus9332 21日前

Your channel deserves sooooo Much more attention😮
@MrAmad3us 21日前

Claude premium plan gives less messages / dollar. It’s significantly more consistent in long and complex convos, but you reach the 5h message limit quickly
@Repz98 21日前

This video was really well made, and I enjoyed it through the entire video! I thought I was watching someone with 200k plus subs, based on the quality of this content. Keep it up, I’m subscribing now!
@drlordbasil 21日前

Beautifully done on the video bro.
@JosefTorkelsen 21日前

Great job on the video dude! I also agree with your results for yourself at the end that discusses how you plan to use them. I like Claude but without those extra things, Chat GPT is my daily driver.
@blueicicle1973 21日前

it sounded a little biased towards Claude
@RanLM1 21日前

Great video. Thank you. Subscribed
@user-ce8ut8hr9k 21日前

What should they have done with the 1 second interval question to keep to 1 second exactly?
@u4icdissonance180 21日前

Excellent video, I think you did a good job of being objective. Just a note, if you're looking for conversation and support, tell GPT you're looking more for an emotionally supportive answer than a solution based answer. Then what you get out of it is very similar to Claude. Claude still deserves the point because your average user likely won't think to try that, but the option is there for people that want more conversational GPT.
@costicanu7 14日前

on writing code, gpt 4.o is way better than sonnet 3.5 I tried them both multiple times, sonnet 3.5 sometimes does not understand when is a harder task. sonnet 3.5 surprised me when I asked for a solution, his answer was suitable for my task. Very good to have them both!