Anthropic's Claude 3.5 Sonnet is a Major Advancement in Frontier Models 🎯

Sorry OpenAI, Claude 3.5 > GPT-4o. 👀 Apple Intelligence will be based on Anthropic, not OpenAI, by late 2026.

Jun 21, 2024

∙ Paid

Anthropic Unveils Claude 3.5 Sonnet: Smarter, Faster and More Personable

Hey Everyone,

I wasn’t planning to write on Friday. But something happend. Every now and again in AI News you come across something significant. I believe Anthropic has done just that this week.

Claude 3.5 Sonnet is really good, surprisingly good.

Read Anthropic's Blog

It’s time we take Anthropic seriously. It’s just so much better than GPT-4o. Anthropic benchmarked Claude 3.5 against OpenAI's newest AI model, GPT-4o, which powers ChatGPT. The results show Anthropic's AI model achieving slightly better results in four of the six benchmarks, which focused on reasoning, coding and math skills.

Download the Claude 3 app on iOS.

Via Allie Miller:

Anthropic’s “Sonnet” now outperforms competitor models on key evaluations, at twice the speed of Claude 3 Opus and one-fifth the cost.

Frontier intelligence at 2x the speed

Benchmark table showing Claude 3.5 Sonnet outperforming (as indicated by green highlights) other AI models on graduate level reasoning, code, multilingual math, reasoning over text, and more evaluations. Models compared include Claude 3 Opus, GPT-4o, Gemini 1.5 Pro, and Llama-400b.

Key details:

Beats GPT-4o on several benchmarks according to Anthropic
Outperforms previous Anthropic models on several AI benchmarks
The new model can analyze text and images and generate text
It offers twice the speed of the previous Claude 3 Opus model
Claude 3.5 Sonnet has a context window of 200,000 tokens (vs 128K for GPT-4o)
Anthropic introduced Artifacts, a new workspace for editing AI-generated content
The model is available now through Anthropic's web client, iOS app, and API
An even better version, Claude 3.5 Opus, will be released soon with features such as web search

The Best Frontier LLM Right Now

Functionally, Claude 3.5 Sonnet sets new industry benchmarks for graduate-level reasoning (GPQA), undergraduate-level knowledge (MMLU), and coding proficiency (HumanEval).

Also it’s more personable: it shows marked improvement in grasping nuance, humor, and complex instructions, and is exceptional at writing high-quality content with a natural, relatable tone.

It’s also just so much smarter than Opus. For example, in an internal agentic coding evaluation, Claude 3.5 Sonnet solved 64% of problems, outperforming Claude 3 Opus which solved 38%.

Claude 3.5 Sonnet can independently write, edit, and execute code with sophisticated reasoning and troubleshooting capabilities.

Tool use in Claude

State-of-the-art vision

Tool use

Define a toolset for Claude and specify your request in natural language. Claude will then select the appropriate tool to fulfill the task and, when appropriate, execute the corresponding action:

Extract structured data from unstructured text: Pull names, dates, and amounts from invoices to reduce manual data entry.
Convert natural language requests into structured API calls: Enable teams to self-serve common actions (e.g., "cancel subscription") with simple commands.
Answer questions by searching databases or using web APIs: Provide instant, accurate responses to customer inquiries in support chatbots.
Automate simple tasks through software APIs: Save time and minimize errors in data entry or file management.
Orchestrate multiple fast Claude subagents for granular tasks: Automatically find the optimal meeting time based on attendee availability.

Data from Anthropic shows that Claude 3.5 Sonnet sets a new industry standard for intelligence.

“Artifacts” on Anthropic

They also launching a preview of Artifacts on http://claude.ai

So you can ask Claude to generate docs, code, mermaid diagrams, vector graphics, or even simple games. Artifacts appear next to your chat, letting you see, iterate, and build on your creations in real-time.

Artifacts—a new way to use Claude

Artifacts is thus a more collaborative way to use Claude.ai. It’s a dynamic workspace where you can see, edit, and build upon Claude’s creations in real-time, seamlessly integrating AI-generated content into their projects and workflows.

Anthopric says Artifacts is just the beginning of a broader vision for Claude.ai, and I’m liking this interface and UX very much. It will also soon expand to support team collaboration.

So in the near future, teams—and eventually entire organizations—will be able to securely centralize their knowledge, documents, and ongoing work in one shared space, with Claude serving as an on-demand teammate. That’s sounds really useful for Enterprise and work settings.

Continue reading this post for free, courtesy of Michael Spencer.

Or purchase a paid subscription.

AI Supremacy