Apple's Siri Overhaul for iOS 27 Relies on Google Gemini, Google Cloud, and Nvidia Chips

Apple
Saturday, 30 May 2026 at 20:02
apple intelligence general featu 1
The Information's Aaron Tilley broke the technical architecture behind Apple's biggest Siri overhaul in years — and it's more complex than Apple's usual privacy-first messaging suggests. Gemini is being used to train smaller on-device models through distillation. Complex Siri queries route to Google Cloud running a licensed Gemini model. Nvidia's confidential computing encrypts data during cloud processing. WWDC on June 8 is where Apple reveals the features built on top of this infrastructure.
Screen Shot 2026-05-30 at 10.27.12 AM

Key Points

  • Apple is using Google's Gemini model to train smaller distilled models that run locally on iPhones, iPads, and Macs — a process that keeps basic AI tasks fast and private on-device
  • Complex Siri queries route to Google Cloud running a licensed Gemini model — Apple's own Private Cloud Compute infrastructure can't efficiently host the full trillion-parameter Gemini model
  • Apple recently approved Nvidia's Confidential Computing technology — encrypting user data and AI models while being processed inside Nvidia GPUs on Google Cloud servers
  • Apple will continue using "Private Cloud Compute" branding even though some processing runs on Google Cloud infrastructure — a framing gap that critics have noted
  • Full Gemini-powered Siri reveal is expected at WWDC June 8, with broader rollout alongside iOS 27 in September
005GvX3ily1idnderjvvoj31jk0rstap

What Distillation Actually Means

Apple isn't putting the full Gemini model on your iPhone. It can't — a trillion-parameter frontier model requires data center-scale hardware. Instead, Apple uses Gemini's outputs to train a much smaller, more efficient model that captures Gemini's reasoning capabilities in a form factor that fits on-device memory.
The result is a local model that handles everyday Siri requests — timers, text editing, basic questions — with no cloud connection required. Only complex queries that exceed the local model's capabilities get routed outward. That architecture is genuinely privacy-preserving for routine interactions, even if the cloud component complicates the narrative.

Why Google Cloud — Not Just Apple's Own Servers

Apple's Private Cloud Compute runs on M-series Mac chips in Apple-owned data centers. Ars Technica reported that this infrastructure has struggled to host even a distilled version of Gemini efficiently. The compute demands of frontier AI inference at scale require GPU clusters that Apple simply doesn't have at sufficient scale yet.
Routing through Google Cloud fills that gap — while the Nvidia Confidential Computing approval gives Apple a privacy story for those external queries. The system encrypts data and AI models while Nvidia GPUs process them, meaning Google's servers handle the request without accessing identifiable user data. Google is contractually barred from using iOS query streams to train its own models.
The privacy trade-off is real but mitigated. The speed hit from confidential computing — which slightly slows cloud AI processing — is the cost Apple pays for maintaining its privacy claims even when using external infrastructure.

WWDC June 8 — What to Expect

Bloomberg confirmed a redesigned Siri interface debuts at WWDC alongside new Apple Intelligence features. The Gemini-powered backend is the engine. The front-end changes — a new visual design dropping the legacy glow overlay in favour of tighter hardware integration — are what users will actually see. September's iOS 27 rollout is the broader deployment target.
Apple is also reportedly exploring acquiring smaller AI companies to accelerate the on-device model compression work — suggesting the Google dependency is a bridge strategy while Apple builds longer-term capability.
loading

Loading