EU AI Act

AI Chatbots Exposing Phone Numbers: A Privacy Crisis

AI chatbots are no longer just answering questions; they're dishing out real people's phone numbers, creating a burgeoning privacy crisis. This isn't a hypothetical; users are reporting direct exposure, and the remedies are scarce.

A person looking at a smartphone screen with multiple missed calls and messages, representing the privacy issue with AI chatbots.

Key Takeaways

  • AI chatbots are increasingly exposing users' real phone numbers due to PII in training data.
  • Companies helping remove personal data online report a 400% surge in AI-related privacy inquiries.
  • There is currently little recourse for individuals whose phone numbers are leaked by AI chatbots.
  • The opacity of LLM training data and algorithms makes it difficult to understand or prevent these leaks.

It started with a trickle. A Redditor, drowning in calls from strangers seeking lawyers and locksmiths. Then, a software developer in Israel, misdirected by Google’s Gemini chatbot to a personal WhatsApp number for customer service. Now, a PhD candidate at the University of Washington watches Gemini cough up a colleague’s private cell. This isn’t an isolated glitch. It’s a systemic failure, and it’s happening now.

AI researchers and privacy advocates have been sounding the alarm about generative AI’s privacy implications for ages. But until recently, it felt largely theoretical. These real-world incidents — though the Redditor’s case remains unverified by us — shift the conversation from ‘what if’ to ‘what now.’ Generative AI is exposing real phone numbers, and there appears to be a disturbing lack of control for those whose data is being broadcast.

The sheer volume of PII in training data is the suspected culprit. Large language models (LLMs) are voracious learners, consuming vast swathes of the internet. Somewhere in that digital sprawl are millions of personal phone numbers, inadvertently scooped up and, alarmingly, sometimes regurgitated. The exact mechanism remains opaque, a black box of algorithms and scraped data, but the outcome is simple: personal contact information is becoming public domain, courtesy of your friendly neighborhood AI.

The Numbers Don’t Lie: A 400% Surge in AI Privacy Concerns

How widespread is this leak? It’s impossible to quantify precisely, but the inbound queries at a company like DeleteMe, which specializes in scrubbing personal data from the internet, offer a stark indicator. They’ve seen a staggering 400% increase in customer inquiries specifically referencing generative AI tools like ChatGPT, Gemini, and Claude over the past seven months. That’s thousands of users, increasingly desperate to understand how their information, or someone else’s, ended up in the AI’s conversational repertoire.

These queries “specifically reference ChatGPT, Claude, Gemini … or other generative AI tools,” says Rob Shavell, the company’s cofounder and CEO.

Shavell categorizes these complaints into two camps: either a user asks about themselves and gets back their own sensitive details (addresses, employer info, family names), or, more disturbingly, the AI serves up plausible-but-false contact information for someone else. This latter scenario is precisely what Daniel Abraham, the Israeli software engineer, experienced.

Abraham received a peculiar WhatsApp message. The sender, a stranger, claimed to be helping him with a PayBox account issue. It turned out Gemini had directed the stranger to Abraham’s personal number, despite PayBox not offering WhatsApp customer service and Abraham having no affiliation with the company. A subsequent query by Abraham to Gemini for PayBox contact information yielded another incorrect, albeit still personal, Israeli phone number. It’s a digital game of telephone gone horribly wrong, with real people caught in the crossfire.

Abraham’s deep dive into his digital footprint revealed his number had been posted on a local forum back in 2015. A decade later, Gemini resurrected it. This points to a fundamental flaw in how these models ingest and retain data – information that might be long forgotten or outdated is being treated as current and relevant, with potentially serious consequences.

Is There a Fix? The Industry’s Tight-Lipped Response

The fundamental challenge lies in the training data. As public datasets dwindle, AI companies are increasingly turning to data brokers and more obscure sources, likely including more personal information. The models are essentially statistical ghosts of the internet, and sometimes, they whisper our private details.

What’s particularly galling is the apparent lack of a simple, direct solution. Users can’t easily opt-out of having their data used in training sets. While some platforms offer limited controls over future interactions, the past remains embedded. The companies behind these LLMs are notoriously tight-lipped about the specifics of their training data and the algorithms that surface PII. This opacity breeds a climate of fear and disempowerment for individuals.

This isn’t just an inconvenience; it’s a serious privacy breach with potential for harassment, identity theft, and a general erosion of trust in AI technologies. As these tools become more integrated into our daily lives, from search engines to customer service, the stakes only get higher. The current trajectory suggests a future where our personal information is perpetually at risk, floating in the digital ether and readily available to the next generative AI that decides to share.

It begs the question: When will the tech giants take meaningful responsibility for the data they use and the consequences of its exposure? The current piecemeal approach, with individual reports and limited user controls, isn’t cutting it. We need systemic solutions, transparency, and a strong commitment to privacy that goes beyond damage control.


🧬 Related Insights

James Kowalski
Written by

Investigative reporter focused on AI accountability, bias cases, and the societal impact of automated decisions.

Worth sharing?

Get the best Legal Tech stories of the week in your inbox — no noise, no spam.

Originally reported by MIT Tech Review - Policy

Stay in the loop

The week's most important stories from Legal AI Beat, delivered once a week.