More from this creator
Other episodes by Kitty Cat.
More like this
If you liked this, try these.
Transcript
The full episode, in writing.
Tim Cook has stepped down as Apple's CEO. The new CEO is John Ternus. For 25 years, John Ternus has been a hardware engineer at Apple. He ran the team responsible for moving the Mac from Intel processors to Apple-designed silicon. That transition is widely considered the most successful silicon transition ever achieved in the consumer sector. Underneath Ternus, Johny Srouji has been elevated to senior vice president of Hardware Technologies. Srouji has led Apple’s chip design for the last decade. The two most senior executives at Apple are now both hardware and silicon engineers, not leaders from software, services, or artificial intelligence backgrounds.
The previous 15 years of Apple were defined by a functional organizational structure. Since Steve Jobs’ return in the late 1990s, Apple organized itself not around specific products like the iPhone or the Mac, but around functions: hardware, software, services, and design. No one team at Apple owns a whole product. Instead, products emerge from the integration and sometimes heated argument between these teams. Jobs believed that if individual product teams controlled their domain, devices would become incoherent, each optimized for its own silo at the expense of the total experience.
This functional structure allowed Apple to produce tightly integrated devices such as the iPhone, Apple Watch, and AirPods, making hardware, software, and services feel seamlessly connected. But this same structure hampered Apple in the generative AI era, because it forced consensus decision-making at the senior executive level. In product integration, this produces excellence. In AI, which is a capability race focused on speed and iteration, this process causes delays. At the leading AI labs, one person can decide to ship a new model every quarter or even every month. Apple’s structure, which requires cross-functional agreement, makes it impossible to move at that pace.
Apple has fallen one, two, or even three years behind the leaders on AI features as a result of these organizational constraints. The Apple board faced a choice: put a software leader in charge and try to turn Apple into a frontier AI lab, or opt out of that race and change the company's strategic direction. By choosing John Ternus, a hardware leader, the board has decided not to race the hyperscalers on their terms.
Apple is now betting on a very different AI future. The rest of the industry is focused on cloud-based AI and racing to build ever larger models in centralized data centers. Every major frontier lab is losing money on its top consumer subscription tier. For example, OpenAI has stated that it is losing money on ChatGPT Pro, even at $200 per month for some users. The underlying reason is that serving capable models to users doing real work costs more than any consumer subscription pricing can cover.
This economic problem is being temporarily hidden by several factors: investor capital is subsidizing ongoing operating losses, GPU supply is growing just fast enough to keep up with surging demand, and there’s an assumption that the cost per token of AI computation will keep falling faster than model capabilities are increasing. However, investor appetite is finite and will eventually give way to expectations of profitability. The hardware side is running into physical limits, such as power and fabrication capacity, not just the willingness of companies like Nvidia to produce more chips.
As a result, the AI industry is heading for a two-class system. Large enterprises that can sign seven- or eight-figure contracts will get access to high-powered, unrestricted AI. These customers can run agents for days or weeks, use extended context windows, and obtain dedicated capacity. Meanwhile, consumers will be limited to metered, throttled access at lower price points. Consumer-facing rate limits are already tightening as labs seek to reduce losses.
Apple cannot rely on a loss-making cloud AI service controlled by another company to deliver core product features for iPhone or Mac users. Apple’s only viable alternative is to shift AI computation from the cloud back onto devices.
This strategy is known as local, on-device, or on-premises AI. Traditionally, the primary argument for on-device AI has been privacy: user data stays on the device, Apple never sees it, and regulators are satisfied. However, the first-order benefit of local AI is its cost structure. On-device inference has a fixed cost: the user pays for the silicon when they buy the device. Once a model is running locally, there is essentially no incremental cost for each additional use — it’s just electricity. By contrast, cloud inference carries a variable cost for every query, and those costs are mounting as demand for tokens explodes due to longer-running, more sophisticated AI workflows.
Apple’s silicon gives users an escape from the “metered” cloud AI model. That contributed to the sell-out demand for Mac Minis, as enthusiasts and professionals sought machines capable of running useful AI models themselves. Apple is not aiming to beat the best cloud models on its own chips, but is instead targeting the broad set of routine AI tasks most people want: summarizing documents, drafting emails, transcribing meetings, translating text, searching personal data, answering questions, and running routine agents. If these tasks happen locally, the cloud becomes a specialist for only the hardest problems.
This strategy echoes Apple’s move in the 1970s. At that time, computing was a centralized, metered service: users rented time on mainframes. The Apple II didn’t rival mainframes in raw power, but it brought a critical mass of computing into a device people owned outright. Once purchased, using it cost nothing extra. This allowed power users, or “prosumers,” to invent new applications — like the spreadsheet program VisiCalc — that couldn’t have existed under the old, metered model. By moving AI inference onto user-owned devices, Apple is betting that a new wave of innovation will follow, just as it did with the original personal computer.
This shift is already visible in certain regulated industries. Law firms, medical practices, accounting firms, tax preparers, financial advisors, and therapists all have strict confidentiality requirements — for example, attorney-client privilege or HIPAA regulations. These professionals see their competitors gaining an edge with cloud AI, but they can’t risk sending sensitive client data to third-party cloud services, even those with advanced privacy features. Many firms are improvising by clustering M-series Mac Minis in their offices to run generative models locally, keeping data on-premises to satisfy both compliance needs and client expectations.
While Apple offers a private cloud compute solution — a cryptographically protected cloud AI service — it does not satisfy the most stringent regulatory requirements. Law firms and medical practices often must be able to state with certainty that sensitive data never left their physical control. Because Apple does not disclose the physical location of its private cloud compute nodes, professionals cannot guarantee compliance for highly sensitive information.
These professionals are often cobbling together their own solutions: buying retail Mac Minis, using open-source models, and relying on custom scripts to keep everything running within their office networks. There is no enterprise-ready, rack-mountable Apple silicon product, no clustering software, no IT admin tools for local inference, and no identity management layer equivalent to iCloud but for on-premises deployment. There are no curated model marketplaces for regulated workflows and no support for HIPAA business associate agreements. This represents a major gap in the enterprise AI market.
The professional services economy in the United States alone is worth trillions of dollars and employs tens of millions of people. A significant segment of this economy requires AI solutions that never touch the cloud. Apple’s silicon is the most natural platform for these buyers, but the company has not yet built the turnkey enterprise products they demand. This gap creates a window for either Apple or third-party vendors to build the infrastructure layer atop Apple’s hardware, much as third-party companies once built services on IBM hardware.
Developer momentum is already pointed at Apple’s silicon ecosystem. For the past decade, new consumer software categories have launched iOS-first. Instagram, ChatGPT’s first mobile app, Threads, and BlueSky all debuted on Apple platforms before Android. This is because Apple users disproportionately buy premium software, making the platform more attractive to developers. If local AI becomes a standard, Apple’s existing developer base is already primed to build for its devices.
For power users, the implications of Apple’s local AI strategy are immediate. Under the cloud AI model, usage is constrained by subscription tiers, token limits, and context window sizes. Power users ration their queries to avoid exceeding limits. With the shift to local inference, the cost of usage drops to near zero, so the ceiling on what’s possible is defined by the user’s own data literacy and hardware. The bottleneck moves from the subscription plan to the sophistication of the user and the devices they buy.
The era where the difference between a two-year-old phone and the current flagship was marginal is ending. Upgrading to the latest Apple silicon can bring dramatic improvements in AI performance. The neural engine generation inside a user’s device will increasingly determine the kind of AI tasks that are possible. This creates a new incentive for prosumers and enterprises alike to buy higher-end devices and upgrade more often.
Apple’s leadership transition to John Ternus is not just about continuity at the top. It is a structural shift in how Apple competes in the AI era. Rather than racing against hyperscalers and cloud-first AI labs, Apple is betting on the economics and flexibility of local inference. The company is repositioning itself as the enabler of a new generation of software powered by on-device AI, serving both consumers and regulated industries with requirements the cloud cannot meet.
The rest of the industry continues to pour capital into cloud data centers, building bigger and more powerful infrastructure in hopes that economies of scale will eventually make consumer AI profitable. But Apple is stating, through its executive choices and product strategy, that the cloud’s economics are fundamentally strained, and that the device in your pocket might become the most important AI platform once again.
A four-person law firm can now run its own AI-powered tools using a handful of Mac Minis, for a few thousand dollars in hardware costs. These machines, running fine-tuned generative models on premises, provide the same kind of structural privacy guarantees that used to be exclusive to mainframe-era institutions. This capability, which is not formally supported by Apple’s enterprise products, has generated significant demand and led to retail shortages of M-series Mac Minis.
Apple’s M-series silicon is built with dedicated neural engines designed for AI workloads. In the Mac Mini, these chips can deliver high throughput for local AI inference, making them suitable for small enterprises trying to run models within a secure IT environment. Each Mac Mini can operate independently, or several can be clustered for greater capacity, allowing professionals to fine-tune and deploy models specific to their domains.
The Apple II’s success in the late 1970s launched the personal computer revolution by making computing accessible to individuals and small businesses. VisiCalc, the first spreadsheet, was created for the Apple II, not for mainframes. The move to local AI inference on Apple silicon echoes this moment: it enables creative and professional workflows that centralized, metered cloud models can’t profitably support.
The economics of cloud AI are under pressure as GPU supply runs up against physical power limitations, and as investors become less willing to subsidize unprofitable consumer offerings. The leading AI labs are tightening rate limits and shifting their most capable services to enterprise contracts. As these constraints become more visible, consumers and professionals are looking for ways to escape the meter, leading to a surge in demand for devices capable of running AI locally.
Apple’s decision to elevate hardware and chip design leaders signals a bet that the most valuable AI workflows for most users will be those that happen on the device itself, not in a remote data center.
The company’s existing developer ecosystem is already aligned around building for Apple silicon first, reinforcing the strategic logic of this pivot. New AI-native products — continuous background agents, assistants that process entire user histories, workflows invoked thousands of times per hour — all become economically viable only when inference is free or nearly free to the end user. These kinds of products are not sustainable on cloud AI APIs, but they are possible on user-owned silicon.
At the intersection of compliance and capability, Apple’s hardware provides a foundation for new enterprise software. The lack of an official enterprise product stack for local inference is a gap that startups or Apple itself may fill in the coming years. The window for building such solutions is open now, before another hardware platform or a new vendor closes the gap.
Apple is repositioning itself as the platform provider for both consumer and professional AI use cases, leveraging its expertise in hardware and device integration.