Claude 3.5 Sonnet vs GPT-4o: Side-by-Side Tests

106,765
0
2024-06-28に共有
The ultimate showdown between two of the most advanced large language models on the market: OpenAI's GPT-4o and Anthropic's Claude 3.5 Sonnet. In this video, I put these models to the test in a series of head-to-head challenges to determine which one truly reigns supreme. I evaluate their responses to various prompts, awarding points to the model that delivers the best performance in each category. Will Claude 3.5 Sonnet live up to its reputation as the best LLM available, or will GPT-4o take the crown? Join me for an in-depth comparison and find out which model comes out on top!

I hope you learn something from this video. Comment with any questions, and I'll make sure to respond!

***

Link to text responses from the video: gist.github.com/patrickstorm/346e17f193ae42036f890…

***

0:00 - Intro
0:27 - Highlights and Benchmarks of Claude 3.5 Sonnet
3:12 - Showdown rules
3:58 - Round 1: Creative Writing
6:55 - Round 2: Image Descriptions
9:09 - Round 3: Coding
15:31 - Round 4: Sentiment Analysis
17:05 - Round 5: Question Answering
20:45 - Round 6: Image Generation
21:07 - Round 7: Conversational Skills
22:26 - Round 8: Summarization
23:53 - Final results & What model am I going to use?

コメント (21)
  • I think given no point when both are correct may bias the final result. Let's say, you've done 20 tests, 15 are the same results, 1 gpt4o is better, 4 Claude Sonnet is better. The score is then 4-1 for Clause Sonnet but actually it's more 19-16.
  • Summary: 1. both are great. 2. don't use either for fact finding. 3. Since they are both free, use both simultaneously.
  • The GDP 2018 question was actually answered correctly. According to every source I found on the internet, Germany was 4th and the UK was 5th.
  • Claude Sonnet is wayyy better for complex tasks and assistance in debugging.
  • Well researched. You document many of the use cases I need and use. An excellent video.
  • This was the best comparison video on YouTube. Great job man, subscribed.
  • Thanks for putting in the time and effort to make this video! I was wondering if I should renew my GPT-4o, or try Claude for the first time. Now I'm set on trying Claude. The video quality is amazing, keep up the good work! :)
  • @ktms1188
    Claude 3.5 and GPT-4o both have their strengths, and it’s fascinating to see how they differ. Claude feels more human, like it’s really trying to understand what I’m asking, but then I’ve noticed with the memories function in GPT the model I think knows a lot more when I’m trying to ask now so now has much better answers like Claude 3.5. My issue is sometimes it hits those frustrating blocks and says it’s unable to answer my question, which drives me nuts even when it’s nothing controversial and it clearly would know the answer. I noticed in one of their talking points that is one of their big things. They are working on as it is overly restrictive and they know it so improve that. GPT-4o, on the other hand, is super analytical but occasionally needs me to rephrase my questions to get the best answers. I’ve been using both for a while now, and here’s what I’ve found: Claude’s artifact mode is mind-blowing, it’s nice if you’re on an iPhone or iPad since no android app. GPT’s memory function is a game-changer, making it more accurate over time as it learns from our interactions. Wouldn’t it be amazing if they combined the best of both worlds? I’d love to see a deep dive comparison between custom GPTs like “Scholar” and the standard GPT-4o, especially for fact-based questions. Does the customization really boost accuracy?
  • I pay for both, primarily for coding, and haven't used 4o since Sonnet came out.
  • I've watched a lot of AI videos out there, this one was truly helpful. You've gained my subscribe & my full attention Patrick! Thank you!
  • @MrAmad3us
    Claude premium plan gives less messages / dollar. It’s significantly more consistent in long and complex convos, but you reach the 5h message limit quickly
  • @Repz98
    This video was really well made, and I enjoyed it through the entire video! I thought I was watching someone with 200k plus subs, based on the quality of this content. Keep it up, I’m subscribing now!
  • Great job on the video dude! I also agree with your results for yourself at the end that discusses how you plan to use them. I like Claude but without those extra things, Chat GPT is my daily driver.
  • @RanLM1
    Great video. Thank you. Subscribed
  • What should they have done with the 1 second interval question to keep to 1 second exactly?
  • Excellent video, I think you did a good job of being objective. Just a note, if you're looking for conversation and support, tell GPT you're looking more for an emotionally supportive answer than a solution based answer. Then what you get out of it is very similar to Claude. Claude still deserves the point because your average user likely won't think to try that, but the option is there for people that want more conversational GPT.
  • @costicanu7
    on writing code, gpt 4.o is way better than sonnet 3.5 I tried them both multiple times, sonnet 3.5 sometimes does not understand when is a harder task. sonnet 3.5 surprised me when I asked for a solution, his answer was suitable for my task. Very good to have them both!