9 Comments
User's avatar
TRADE CRAFTERS's avatar

Everyone is chasing intelligence, but the real bottleneck is experience, and that’s the one thing you can’t shortcut with capital.

The gap shows up the moment the system has to act instead of describe, which is where most narratives quietly break.

Kristi Pihl's avatar

Great essay. I’d add that this data capture problem is also where the relentless focus on TAM and business value has hindered the technological advancement of world models. I can think of a group of millions of people (cough cough, moms) who would gladly wear a device that captures their data while folding laundry if it meant one day not having to do that work anymore. But replacing invisible, unpaid labor will never be a priority for the investor class in our current marketplace. Meaning we’ll continue limiting ourselves to improving work our system has already assigned monetary value to.

Michael Spencer's avatar

Thank you Kristi that's a super thoughtful comment. American women continue to do chores at the rate of 2.5 to 1 ratio as compared to men. According to the Bureau of Labor Statistics (BLS) 2024 and 2025 reports, women spend about 2.7 hours per day on household activities (including cooking, cleaning, and management). Can you imagine the time saved and productivity unleashed if affordable consumer robots who could do these tasks existed?

We can hope! Hopefully in the 2030s we can make some serious headway in this regard. Even in their 20s, women report spending roughly 1 hour and 45 minutes on chores compared to 1 hour for men.

Imagine if there were more women in Technology and in AI, robotics and frontier tech what would be possible? I often think about it, and wonder about how unpaid work could be transformed in the future by automation and make child caring also more affordable for a new generation of women who aren't so sure it's all worth it.

Instead SpaceX is more likely to focus on the lunar economy and national defense implications. We can imagine a better world!

Kristi Pihl's avatar

Amen! And yes, we certainly can hope.

One of my proudest career achievements is navigating male-dominated spaces for so long without losing touch with my womanhood. It is incredibly difficult to resist the pressure to either adopt a more masculine persona to fit in or fall into the bitterness that often claims senior female executives.

As someone who is frequently the sole representative of my gender in the room, it has been an ongoing battle to maintain comfort with my femininity and motherhood. However, it is a fight that becomes increasingly worth it as I age. The ability to view patterns and incentives from an entirely different angle isn't just a gift, it’s a superpower and a massive strategic advantage.

Also, you don't have to imagine a future with more women in Tech, AI, and frontier industries; it is already happening. That progress will continue regardless of the resistance. We exist. Our daughters exist. Our challenges exist. Our potential exists. Our supportive husbands and sons exist. And none of us are going anywhere ;)

Dan Cucolea's avatar

Models scoring sub 1% on ARC-AGI-3 has got to be the most shocking benchmark result I have seen in years. It shows that language skill and actual world understanding are 2 completely different things.

I stopped being impressed by benchmark scores after the first five times the industry declared victory, and now I can wait to see companies benchmaxxing for ARC-AGI-3 as well.

Michael Spencer's avatar

That's a good point then. At the end of the day it's human judgment and not benchmarks that will be the determining Factor for many companies trying to leverage AI. Consumers and customers and ordinary people in the end have quite a significant say.

For all the hyping of the sector the backlash is forming.

Leon Liao's avatar

World models are indeed far harder than LLMs. And the main difficulty does not lie in the “more advanced architecture” that fundraising narratives most like to emphasize. It lies in the friction of real-world data, the high cost of deployment loops, the difficulty of generalizing across environments, and the fact that the physical world itself has a far higher-dimensional and more tightly constrained structure than the world of language.

But it would also be a mistake to swing to the opposite extreme and conclude that world models will remain purely conceptual for the next decade. The more plausible path is this: general world models will be slow to emerge, but narrow-domain world models will likely begin to take shape much earlier in high-value industries with relatively closed environments and strong feedback loops. Many commercially important settings do not require a fully general model of the world. They only require a local world model. Semiconductor manufacturing, warehouse picking, fixed-station robotics, specific surgical workflows, and drone inspection all operate in environments that are more bounded, with fewer variables and more controllable action spaces. In other words, fully general world models may still be distant, but a large number of quasi-world models or domain-specific world models may arrive much earlier than the market expects.

Second, world models are unlikely to replace LLMs in isolation. What is more likely is a gradual coupling of LLMs with vision models, action models, planning modules, simulation environments, and retrieval-memory systems. The future therefore looks less like a single new paradigm and more like a composite system. That means world models may not appear in the dramatic form that markets often imagine, as if a new king suddenly ascends the throne. More likely, they will first emerge quietly as embedded components inside robotics, autonomous driving, industrial control, and agent planning systems.

Third, I strongly agree that the companies most likely to win will not necessarily be the ones with the smartest architecture on paper. They will be the ones most willing to do the slow, expensive, and often tedious work of real-world data collection and deployment. That means partnering with warehouses, hospitals, and construction firms, actually deploying robots into physical environments, and building real feedback loops over time. In the LLM era, the leading firms were often those with superior access to compute, talent, internet-scale text corpora, distribution platforms, and capital. In the world-model era, the advantage may shift toward firms that can combine hardware access, industry-specific scenarios, long-term customer relationships, sensor networks, simulation toolchains, and real deployment capability. This is not simply a race to build “the next OpenAI.” It is much closer to a competition in AI plus industrial organization. Whoever can bind models, data, scenarios, devices, and feedback loops into one operating system will be in the strongest position to lead.

In these respects, China may in fact hold stronger structural advantages than the United States. On one side, China is ahead in major parts of the robotics and hardware ecosystem. On the other, the cost of real-world data collection in China can be only a fraction of that in the United States, in some cases closer to one-tenth. That combination matters enormously in a field where iteration speed, deployment density, and feedback acquisition may matter more than abstract model elegance.

Fourth, although world models are harder partly because real-world data does not exist in the same naturally abundant form as internet text, the deeper problem is even more fundamental than data scarcity. The real world is not merely under-sampled. Its actionable causal structure is itself profoundly hierarchical. A falling cup is not just a visual trajectory. It also involves material properties, gravity, friction, goals, constraints, task decomposition, risk assessment, and counterfactual reasoning. So the difficulty of world models is not only that they lack enough data. It is also that the world itself contains a higher-dimensional and more tightly constrained structure than language does.

Karyn Farrell's avatar

Regardless of how AI evolves with LLMs, it will always lack the "X factor" that human beings inherently possess. The capacity for clairvoyance, telepathy, and other 6th senses in human beings is the next frontier unless AI signals the halt of the evolution of human intelligence.

Odin's Eye's avatar

Thank you. This is the most thought provoking piece I’ve read in quite a while. I will reread and restack