WhatsApp · Consent & Voice · 2025–26

Meta AI in WhatsApp Calls

Designing consent for a sense you can't see

Voice UX Compliance Consent frameworks

Project overview

My roleConsent strategy, audio & visual NUX, regional scripts, prototypes, leadership alignment — 5 call scenarios · Senior Content Designer

StatusIn development · Launch targeted late 2026

LeadershipHead of WA, 4 VPs, 6 Directors

RegionsEU, BR, RoW copy + audio prototypes

WhatsApp call screen: joining a call with Meta AI for the first time.

Situation

When the call starts before you can show anything.

WhatsApp handles 100M+ calls a day with a familiar rhythm: tap, talk. Calls are end-to-end encrypted by default, and users trust that.

Meta is adding AI to WhatsApp calls—in-call help, captions, summaries, translation. Models need to process call data, so the call itself is no longer end-to-end encrypted—breaking an expectation users have that their WhatsApp calls are end-to-end encrypted. As with every Meta AI product, users must consent to AI terms before first use.

Calling differs from chat in ways that break existing consent patterns:

Real-time data streaming: Audio and video begin transmitting the moment a call connects — before any disclosure can be shown — unlike chat where users explicitly send messages.
Legal constraints: Voice capture triggers wiretapping-style requirements that don't apply to text, including explicit disclosure of AI presence.
Lockscreen usage: ~40% of calls are answered without opening the app (e.g. earbuds, CarPlay, pocket), making visual consent insufficient.
No precedent: Meta products have never needed audio-only consent; all prior flows have a visual element, and users don't expect spoken disclosures during calls.

Core challenge

How do you get informed consent from users who may never see the screen, when data collection starts instantly, in a medium where any audio interruption is unprecedented?

Task

Consent across every way a call can start.

I designed all user-facing flows for Meta AI in WhatsApp calls, including the consent strategy for audio and video AI across five user scenarios. My scope:

End-to-end flow design for each scenario: creator pre-call, creator mid-call, receiver pre-call, receiver mid-call, and lockscreen / earpiece users
Audio NUX content — the words a synthetic voice would speak to users mid-call, localized for RoW, BR, and EU regulatory requirements
Visual NUX design — the on-screen consent surface for users who open the app during a call
Prototyping — audio prototypes of consent flows so WhatsApp leadership could hear the tradeoffs, not just read about them
Leadership reviews — presented to the WhatsApp AI Leadership Team, prepared pre-reads, and secured alignment across the Head of WA, 4 VPs, and 6 Directors

Action

From timing paradox to a scalable consent framework.

From timing paradox to scalable consent

The challenge: The timing paradox

Mobile OS architectures (CallKit / Android Dialer) trigger audio capture the millisecond a call connects. This creates a regulatory conflict: data begins flowing before a user can be notified, violating wiretapping laws that require disclosure prior to interception.

The solution: The blocking buffer

I architected a blocking audio mechanism that intercepts the stream at the OS level. This technical gate holds all outbound data in a local buffer while a mandatory Audio NUX plays.

How it works:

Connection: User answers call.
Buffering: Audio/video is blocked locally; no data is transmitted to participants or servers.
Disclosure: Audio NUX plays, satisfying legal requirements.
Transmission: Stream released only after playback completes.

The impact

This framework solved the technical–legal impasse, unblocking Meta AI's expansion into encrypted calling and third-party agent surfaces.

WhatsApp AI consent framework

WhatsApp AI consent framework for calls, page 3.

WhatsApp AI consent framework for calls, page 4.

WhatsApp AI consent framework for calls, page 5.

Two options — each with tradeoffs

I started from a wider option set; the AI Review panel and WhatsApp leadership compared two paths modeled and prototyped to comparable depth.

Option 1: Audio NUX replaces Visual NUX. The audio announcement serves as the full transparency mechanism. No visual interaction required. Users hear the disclosure once — a region-specific legal message (8–10 seconds) — and they're in the call.

Option 2: Short audio + forced visual. Same brief audio, but audio/video is fully blocked until the user unlocks their phone and taps through the Visual NUX. Full transparency, but at the cost of call functionality.

Third direction (explored, not shipped): After Audio NUX playback, audio/video would go to the other human participants on the call (they could hear you), while sharing to Meta AI stayed blocked until the user completed the Visual NUX. We ruled it out for technical complexity and a fragmented product experience.

Dimension	Option 1 (full audio)	Option 2 (audio + forced visual)
Audio message	Long (8–10 sec), legal language, unprecedented on WhatsApp	Short, action-oriented, matches mobile notification patterns
User action required	None — stay on the line	Must unlock phone and tap through NUX
Data sharing	Starts after audio finishes	Only after user opens app and taps through — audio/video fully blocked until then
Consent model	Passive — "by continuing" = not hanging up	Active — explicit tap before data flows
BR "rights to object"	Dead end — audio tells user to "Open WhatsApp" but screen is locked	Handled in Visual NUX where user can read and act
User perception / trust	High risk — synthetic voice + legal language may feel like spam	Lower risk — short audio, full transparency happens visually

Option set - presented to WhatsApp Leadership Team & AI Review panel

AI in Calls consent: option set exploration, page 1.

AI in Calls consent: option set exploration, page 2.

Prototyping what users would actually hear

This was the moment that changed the conversation. Written options look reasonable in a document. No one at WhatsApp had ever heard what a synthetic voice reading legal terms on a call would sound like.

I created audio prototypes — actual recordings of each option, in each region's language. This wasn't a UX mockup. It was an experience of what millions of users would hear on their first AI-enabled group call. When leadership heard the options, it was clear we would need to make sacrifices.

The prototypes made the tradeoffs tangible: Option 1 sounded jarring — but it was over in 10 seconds, happened once, and then the user was in the call. Option 2 sounded clean — but users on locked screens would be silently muted with no feedback, talking without realizing no one could hear them.

Ultimately, I aligned the team on Option 1, recognizing that we needed to make deliberate trade-offs to get people into the product quickly and with minimal friction. We also recognized that we needed to make the AI voice sound less synthetic — we will be implementing a more human-sounding voice using one of the Meta AI presets.

Option 1 · Full audio NUX

Option 2 · Short audio + Visual NUX

Region-specific consent content

The audio content wasn't one-size-fits-all. Different regions have different legal requirements for AI disclosure. Each region's content was a legal–content design negotiation. Every word served a specific legal purpose:

"Meta can access this call" — alerts users that E2EE is no longer active
"Meta is recording this call" (EU only) — required by EU regulation, explicitly states what's happening
"Won't use it to improve AI" (EU only) — transparency about data use restrictions under EU law
"Open WhatsApp to learn about your rights to object" (BR only) — required disclosure about data subject rights

I also designed the progressive disclosure matrix: new users of the feature and Meta AI on WhatsApp hear the full disclosure. Users who are new to calling but have already accepted AI terms elsewhere hear a shorter version. This reduced the audio burden over time while maintaining legal compliance.

Building a scalable framework

Beyond the immediate decision, I built the consent framework to scale. AI in Calls isn't one feature — it's a surface. Call summaries, live captions, real-time translation, and in-call assistance all need consent. Building a one-off solution for each would create inconsistency and slow every future launch.

The framework established:

When audio disclosure is needed vs. when visual is sufficient
What the minimum viable disclosure contains
How disclosure degrades gracefully for returning users — from full disclosure to brief notification to minimal acknowledgment
How new AI call features inherit the framework — future features don't need to reinvent consent; they plug into the existing matrix

Call summaries, live captions, and real-time translation can launch with established consent patterns — reducing the legal / content / design alignment cycle from months to weeks.

WhatsApp AI transparency & disclosure strategy

WhatsApp AI transparency and disclosure strategy, page 1.

WhatsApp AI transparency and disclosure strategy, page 2.

WhatsApp AI transparency and disclosure strategy, page 3.

WhatsApp AI transparency and disclosure strategy, page 4.

End-to-end flows

I prototyped end-to-end first-time flows for every way someone starts or receives a call that includes Meta AI — so consent reads as one system, not scattered copy.

Every path shares the same building blocks (blocking buffer, disclosure, terms); what shifts is access. Lockscreen / no reliable app UI → audio ToS in-call. In WhatsApp (or unlockable) → Visual ToS: value props and required legal language in one dual-purpose Visual NUX.

Creator · pre-call

Creator · mid-call

Receiver · pre-call

Receiver · mid-call

Results

In development — framework and alignment in place.

AI in Calls is currently in development, with dogfooding underway and launch targeted late 2026.

Deliverable	Status
Consent framework	First audio/video AI consent framework on WhatsApp — setting standards for all future AI call features
User scenarios covered	All five (creator pre-call, creator mid-call, receiver pre-call, receiver mid-call, lockscreen / earpiece)
WA leadership alignment	Secured across Head of WA, 4 VPs, 6 Directors
Region-specific content	Complete for RoW, BR, EU with progressive disclosure matrix
Audio prototypes	Created for all options and regions — used in WA leadership decision-making

Metric	Value
Future features covered	Call summaries, live captions, real-time translation, in-call assistance
Consent reuse	Framework designed for inheritance — new features plug into existing matrix
Legal alignment cycle	Reduced from months to weeks for future features

Reflection

What this project taught me

AI in Calls pushed content design into a new modality — spoken, real-time, and synthetic. Unlike every prior consent flow I'd worked on (all visual), this required designing language meant to be heard, not read. There's very little precedent for this at Meta or elsewhere.

What I'd do again

I'd double down on audio prototyping. Written specs made both options feel comparable, but hearing them revealed the real tradeoffs instantly. For audio-first experiences, artifacts should be audio-first — something I'll carry forward.

What I'd do differently

I would have pushed earlier for UXR testing using the audio prototypes. Internal reactions were strong, but not necessarily representative of users familiar with voice calling tools like Zoom or Teams. I'd also design more content-specific evaluation criteria, not just general usability feedback.

What this means for the discipline

AI in Calls signals where content design is heading. As products move into voice, video, and ambient interfaces, we'll design beyond screens. The core shift is simple: it's no longer "what should the screen say?" but "what should the experience say?"