Anthropic's Claude 3.5 Sonnet is a Major Advancement in Frontier Models 🎯
Sorry OpenAI, Claude 3.5 > GPT-4o. 👀 Apple Intelligence will be based on Anthropic, not OpenAI, by late 2026.
Hey Everyone,
I wasn’t planning to write on Friday. But something happend. Every now and again in AI News you come across something significant. I believe Anthropic has done just that this week.
Claude 3.5 Sonnet is really good, surprisingly good.
It’s time we take Anthropic seriously. It’s just so much better than GPT-4o. Anthropic benchmarked Claude 3.5 against OpenAI's newest AI model, GPT-4o, which powers ChatGPT. The results show Anthropic's AI model achieving slightly better results in four of the six benchmarks, which focused on reasoning, coding and math skills.
Download the Claude 3 app on iOS.
Via Allie Miller:
Anthropic’s “Sonnet” now outperforms competitor models on key evaluations, at twice the speed of Claude 3 Opus and one-fifth the cost.
Frontier intelligence at 2x the speed
Key details:
Beats GPT-4o on several benchmarks according to Anthropic
Outperforms previous Anthropic models on several AI benchmarks
The new model can analyze text and images and generate text
It offers twice the speed of the previous Claude 3 Opus model
Claude 3.5 Sonnet has a context window of 200,000 tokens (vs 128K for GPT-4o)
Anthropic introduced Artifacts, a new workspace for editing AI-generated content
The model is available now through Anthropic's web client, iOS app, and API
An even better version, Claude 3.5 Opus, will be released soon with features such as web search
The Best Frontier LLM Right Now
Functionally, Claude 3.5 Sonnet sets new industry benchmarks for graduate-level reasoning (GPQA), undergraduate-level knowledge (MMLU), and coding proficiency (HumanEval).
Also it’s more personable: it shows marked improvement in grasping nuance, humor, and complex instructions, and is exceptional at writing high-quality content with a natural, relatable tone.
It’s also just so much smarter than Opus. For example, in an internal agentic coding evaluation, Claude 3.5 Sonnet solved 64% of problems, outperforming Claude 3 Opus which solved 38%.
Claude 3.5 Sonnet can independently write, edit, and execute code with sophisticated reasoning and troubleshooting capabilities.
State-of-the-art vision
Tool use
Define a toolset for Claude and specify your request in natural language. Claude will then select the appropriate tool to fulfill the task and, when appropriate, execute the corresponding action:
Extract structured data from unstructured text: Pull names, dates, and amounts from invoices to reduce manual data entry.
Convert natural language requests into structured API calls: Enable teams to self-serve common actions (e.g., "cancel subscription") with simple commands.
Answer questions by searching databases or using web APIs: Provide instant, accurate responses to customer inquiries in support chatbots.
Automate simple tasks through software APIs: Save time and minimize errors in data entry or file management.
Orchestrate multiple fast Claude subagents for granular tasks: Automatically find the optimal meeting time based on attendee availability.
Data from Anthropic shows that Claude 3.5 Sonnet sets a new industry standard for intelligence.
“Artifacts” on Anthropic
They also launching a preview of Artifacts on http://claude.ai
So you can ask Claude to generate docs, code, mermaid diagrams, vector graphics, or even simple games. Artifacts appear next to your chat, letting you see, iterate, and build on your creations in real-time.
Artifacts—a new way to use Claude
Artifacts is thus a more collaborative way to use Claude.ai. It’s a dynamic workspace where you can see, edit, and build upon Claude’s creations in real-time, seamlessly integrating AI-generated content into their projects and workflows.
Anthopric says Artifacts is just the beginning of a broader vision for Claude.ai, and I’m liking this interface and UX very much. It will also soon expand to support team collaboration.
So in the near future, teams—and eventually entire organizations—will be able to securely centralize their knowledge, documents, and ongoing work in one shared space, with Claude serving as an on-demand teammate. That’s sounds really useful for Enterprise and work settings.
Keep reading with a 7-day free trial
Subscribe to AI Supremacy to keep reading this post and get 7 days of free access to the full post archives.