WhatsApp handles 100M+ calls a day with a familiar rhythm: tap, talk. Calls are end-to-end encrypted by default, and users trust that.
Meta is adding AI to WhatsApp calls—in-call help, captions, summaries, translation. Models need to process call data, so the call itself is no longer end-to-end encrypted—breaking an expectation users have that their WhatsApp calls are end-to-end encrypted. As with every Meta AI product, users must consent to AI terms before first use.
Calling differs from chat in ways that break existing consent patterns:
I designed all user-facing flows for Meta AI in WhatsApp calls, including the consent strategy for audio and video AI across five user scenarios. My scope:
Mobile OS architectures (CallKit / Android Dialer) trigger audio capture the millisecond a call connects. This creates a regulatory conflict: data begins flowing before a user can be notified, violating wiretapping laws that require disclosure prior to interception.
I architected a blocking audio mechanism that intercepts the stream at the OS level. This technical gate holds all outbound data in a local buffer while a mandatory Audio NUX plays.
How it works:
This framework solved the technical–legal impasse, unblocking Meta AI's expansion into encrypted calling and third-party agent surfaces.
WhatsApp AI consent framework
I started from a wider option set; the AI Review panel and WhatsApp leadership compared two paths modeled and prototyped to comparable depth.
Option 1: Audio NUX replaces Visual NUX. The audio announcement serves as the full transparency mechanism. No visual interaction required. Users hear the disclosure once — a region-specific legal message (8–10 seconds) — and they're in the call.
Option 2: Short audio + forced visual. Same brief audio, but audio/video is fully blocked until the user unlocks their phone and taps through the Visual NUX. Full transparency, but at the cost of call functionality.
Third direction (explored, not shipped): After Audio NUX playback, audio/video would go to the other human participants on the call (they could hear you), while sharing to Meta AI stayed blocked until the user completed the Visual NUX. We ruled it out for technical complexity and a fragmented product experience.
| Dimension | Option 1 (full audio) | Option 2 (audio + forced visual) |
|---|---|---|
| Audio message | Long (8–10 sec), legal language, unprecedented on WhatsApp | Short, action-oriented, matches mobile notification patterns |
| User action required | None — stay on the line | Must unlock phone and tap through NUX |
| Data sharing | Starts after audio finishes | Only after user opens app and taps through — audio/video fully blocked until then |
| Consent model | Passive — "by continuing" = not hanging up | Active — explicit tap before data flows |
| BR "rights to object" | Dead end — audio tells user to "Open WhatsApp" but screen is locked | Handled in Visual NUX where user can read and act |
| User perception / trust | High risk — synthetic voice + legal language may feel like spam | Lower risk — short audio, full transparency happens visually |
Option set - presented to WhatsApp Leadership Team & AI Review panel
This was the moment that changed the conversation. Written options look reasonable in a document. No one at WhatsApp had ever heard what a synthetic voice reading legal terms on a call would sound like.
I created audio prototypes — actual recordings of each option, in each region's language. This wasn't a UX mockup. It was an experience of what millions of users would hear on their first AI-enabled group call. When leadership heard the options, it was clear we would need to make sacrifices.
The prototypes made the tradeoffs tangible: Option 1 sounded jarring — but it was over in 10 seconds, happened once, and then the user was in the call. Option 2 sounded clean — but users on locked screens would be silently muted with no feedback, talking without realizing no one could hear them.
Ultimately, I aligned the team on Option 1, recognizing that we needed to make deliberate trade-offs to get people into the product quickly and with minimal friction. We also recognized that we needed to make the AI voice sound less synthetic — we will be implementing a more human-sounding voice using one of the Meta AI presets.
Option 1 · Full audio NUX
Option 2 · Short audio + Visual NUX
The audio content wasn't one-size-fits-all. Different regions have different legal requirements for AI disclosure. Each region's content was a legal–content design negotiation. Every word served a specific legal purpose:
I also designed the progressive disclosure matrix: new users of the feature and Meta AI on WhatsApp hear the full disclosure. Users who are new to calling but have already accepted AI terms elsewhere hear a shorter version. This reduced the audio burden over time while maintaining legal compliance.
Beyond the immediate decision, I built the consent framework to scale. AI in Calls isn't one feature — it's a surface. Call summaries, live captions, real-time translation, and in-call assistance all need consent. Building a one-off solution for each would create inconsistency and slow every future launch.
The framework established:
Call summaries, live captions, and real-time translation can launch with established consent patterns — reducing the legal / content / design alignment cycle from months to weeks.
WhatsApp AI transparency & disclosure strategy
I prototyped end-to-end first-time flows for every way someone starts or receives a call that includes Meta AI — so consent reads as one system, not scattered copy.
Every path shares the same building blocks (blocking buffer, disclosure, terms); what shifts is access. Lockscreen / no reliable app UI → audio ToS in-call. In WhatsApp (or unlockable) → Visual ToS: value props and required legal language in one dual-purpose Visual NUX.
Creator · pre-call
Creator · mid-call
Receiver · pre-call
Receiver · mid-call
AI in Calls is currently in development, with dogfooding underway and launch targeted late 2026.
| Deliverable | Status |
|---|---|
| Consent framework | First audio/video AI consent framework on WhatsApp — setting standards for all future AI call features |
| User scenarios covered | All five (creator pre-call, creator mid-call, receiver pre-call, receiver mid-call, lockscreen / earpiece) |
| WA leadership alignment | Secured across Head of WA, 4 VPs, 6 Directors |
| Region-specific content | Complete for RoW, BR, EU with progressive disclosure matrix |
| Audio prototypes | Created for all options and regions — used in WA leadership decision-making |
| Metric | Value |
|---|---|
| Future features covered | Call summaries, live captions, real-time translation, in-call assistance |
| Consent reuse | Framework designed for inheritance — new features plug into existing matrix |
| Legal alignment cycle | Reduced from months to weeks for future features |