It started with a trickle. A Redditor, drowning in calls from strangers seeking lawyers and locksmiths. Then, a software developer in Israel, misdirected by Google’s Gemini chatbot to a personal WhatsApp number for customer service. Now, a PhD candidate at the University of Washington watches Gemini cough up a colleague’s private cell. This isn’t an isolated glitch. It’s a systemic failure, and it’s happening now.
AI researchers and privacy advocates have been sounding the alarm about generative AI’s privacy implications for ages. But until recently, it felt largely theoretical. These real-world incidents — though the Redditor’s case remains unverified by us — shift the conversation from ‘what if’ to ‘what now.’ Generative AI is exposing real phone numbers, and there appears to be a disturbing lack of control for those whose data is being broadcast.
The sheer volume of PII in training data is the suspected culprit. Large language models (LLMs) are voracious learners, consuming vast swathes of the internet. Somewhere in that digital sprawl are millions of personal phone numbers, inadvertently scooped up and, alarmingly, sometimes regurgitated. The exact mechanism remains opaque, a black box of algorithms and scraped data, but the outcome is simple: personal contact information is becoming public domain, courtesy of your friendly neighborhood AI.
The Numbers Don’t Lie: A 400% Surge in AI Privacy Concerns
How widespread is this leak? It’s impossible to quantify precisely, but the inbound queries at a company like DeleteMe, which specializes in scrubbing personal data from the internet, offer a stark indicator. They’ve seen a staggering 400% increase in customer inquiries specifically referencing generative AI tools like ChatGPT, Gemini, and Claude over the past seven months. That’s thousands of users, increasingly desperate to understand how their information, or someone else’s, ended up in the AI’s conversational repertoire.
These queries “specifically reference ChatGPT, Claude, Gemini … or other generative AI tools,” says Rob Shavell, the company’s cofounder and CEO.
Shavell categorizes these complaints into two camps: either a user asks about themselves and gets back their own sensitive details (addresses, employer info, family names), or, more disturbingly, the AI serves up plausible-but-false contact information for someone else. This latter scenario is precisely what Daniel Abraham, the Israeli software engineer, experienced.
Abraham received a peculiar WhatsApp message. The sender, a stranger, claimed to be helping him with a PayBox account issue. It turned out Gemini had directed the stranger to Abraham’s personal number, despite PayBox not offering WhatsApp customer service and Abraham having no affiliation with the company. A subsequent query by Abraham to Gemini for PayBox contact information yielded another incorrect, albeit still personal, Israeli phone number. It’s a digital game of telephone gone horribly wrong, with real people caught in the crossfire.
Abraham’s deep dive into his digital footprint revealed his number had been posted on a local forum back in 2015. A decade later, Gemini resurrected it. This points to a fundamental flaw in how these models ingest and retain data – information that might be long forgotten or outdated is being treated as current and relevant, with potentially serious consequences.
Is There a Fix? The Industry’s Tight-Lipped Response
The fundamental challenge lies in the training data. As public datasets dwindle, AI companies are increasingly turning to data brokers and more obscure sources, likely including more personal information. The models are essentially statistical ghosts of the internet, and sometimes, they whisper our private details.
What’s particularly galling is the apparent lack of a simple, direct solution. Users can’t easily opt-out of having their data used in training sets. While some platforms offer limited controls over future interactions, the past remains embedded. The companies behind these LLMs are notoriously tight-lipped about the specifics of their training data and the algorithms that surface PII. This opacity breeds a climate of fear and disempowerment for individuals.
This isn’t just an inconvenience; it’s a serious privacy breach with potential for harassment, identity theft, and a general erosion of trust in AI technologies. As these tools become more integrated into our daily lives, from search engines to customer service, the stakes only get higher. The current trajectory suggests a future where our personal information is perpetually at risk, floating in the digital ether and readily available to the next generative AI that decides to share.
It begs the question: When will the tech giants take meaningful responsibility for the data they use and the consequences of its exposure? The current piecemeal approach, with individual reports and limited user controls, isn’t cutting it. We need systemic solutions, transparency, and a strong commitment to privacy that goes beyond damage control.