DeepSeek V4 is amazing. In ecosystem maturity, DeepSeek still trails the top closed-source stacks by roughly one to two years. But “domesticization + open source + ultra-low cost + private deployment” is a uniquely powerful and hard-to-replicate combination.For SMBs, research, education, and individual developers, this is not merely “another model option”—it is closer to a structural reset that pulls frontier-level capability out of the pricing regime of a few giants and back onto a cost curve and engineering path that a much wider set of actors can actually absorb.
Both V4-Pro and V4-Flash built around million-token context (1M tokens). This gives it a natural advantage in scenarios where you want to “stuff an entire codebase, long documents, or very long conversations into a single context window.” By comparison, Claude Opus 4.6/4.7 also supports 1M context, while OpenAI’s GPT-5.5 in ChatGPT offers 256K (paid tiers) or 400K (Pro tier, and only when you manually select Thinking)—not the 1M tier.
V4’s pricing is extremely aggressive: V4-Pro is roughly $1.74/$3.48 (input/output) per million tokens, and V4-Flash is even lower at $0.14/$0.28. Add the option to download the weights and deploy privately, and the marginal inference cost can move even closer to zero—assuming you have sufficient in-house compute. For reference, Anthropic prices Opus 4.6/4.7 at roughly $5/$25, and OpenAI prices GPT-5.5 at $5/$30 on the API.
This is a clear progress toward domesticization, but not “fully decoupled from CUDA” . It’s important to be explicit here: today’s open-source inference code is still largely CUDA-based, and the tooling remains tightly coupled to Nvidia’s ecosystem. At the same time, V4 has been validated to run on Huawei Ascend via CANN, and Huawei’s Ascend Supernode has publicly announced full-stack support for V4. The more precise interpretation is that China has moved another step forward on “model–chip–software stack co-optimization.” But that is not the same as “fully escaping CUDA,” and it certainly does not mean the ecosystem maturity is already at parity. “Fully autonomous and universally usable” will still depend on large-scale Ascend deployment in the second half of the year, broader tooling/operator coverage, and more complete full-stack open-sourcing plus community validation.
Open source models will destroy there API MONEY CREDIT system. But the issue is peoples are still fighting with AI because they know what BIG GIANTS doing unrestricted domain. As Cybersecurity is important I think we need a AI RULE SETS working universally like windows or android they can't be replaced just example.
DeepSeek V4 is amazing. In ecosystem maturity, DeepSeek still trails the top closed-source stacks by roughly one to two years. But “domesticization + open source + ultra-low cost + private deployment” is a uniquely powerful and hard-to-replicate combination.For SMBs, research, education, and individual developers, this is not merely “another model option”—it is closer to a structural reset that pulls frontier-level capability out of the pricing regime of a few giants and back onto a cost curve and engineering path that a much wider set of actors can actually absorb.
Both V4-Pro and V4-Flash built around million-token context (1M tokens). This gives it a natural advantage in scenarios where you want to “stuff an entire codebase, long documents, or very long conversations into a single context window.” By comparison, Claude Opus 4.6/4.7 also supports 1M context, while OpenAI’s GPT-5.5 in ChatGPT offers 256K (paid tiers) or 400K (Pro tier, and only when you manually select Thinking)—not the 1M tier.
V4’s pricing is extremely aggressive: V4-Pro is roughly $1.74/$3.48 (input/output) per million tokens, and V4-Flash is even lower at $0.14/$0.28. Add the option to download the weights and deploy privately, and the marginal inference cost can move even closer to zero—assuming you have sufficient in-house compute. For reference, Anthropic prices Opus 4.6/4.7 at roughly $5/$25, and OpenAI prices GPT-5.5 at $5/$30 on the API.
This is a clear progress toward domesticization, but not “fully decoupled from CUDA” . It’s important to be explicit here: today’s open-source inference code is still largely CUDA-based, and the tooling remains tightly coupled to Nvidia’s ecosystem. At the same time, V4 has been validated to run on Huawei Ascend via CANN, and Huawei’s Ascend Supernode has publicly announced full-stack support for V4. The more precise interpretation is that China has moved another step forward on “model–chip–software stack co-optimization.” But that is not the same as “fully escaping CUDA,” and it certainly does not mean the ecosystem maturity is already at parity. “Fully autonomous and universally usable” will still depend on large-scale Ascend deployment in the second half of the year, broader tooling/operator coverage, and more complete full-stack open-sourcing plus community validation.
Thanks, you make a lot of good and salient points as usual.
Open source models will destroy there API MONEY CREDIT system. But the issue is peoples are still fighting with AI because they know what BIG GIANTS doing unrestricted domain. As Cybersecurity is important I think we need a AI RULE SETS working universally like windows or android they can't be replaced just example.
It does feel like we're out of a turning point doesn't it on multiple fronts.