The way you talk can reveal a lot about you—especially if you’re talking to a chatbot. New research reveals that chatbots like ChatGPT can infer a lot of sensitive information about the people they chat with, even if the conversation is utterly mundane.
The phenomenon appears to stem from the way the models’ algorithms are trained with broad swathes of web content, a key part of what makes them work, likely making it hard to prevent. “It’s not even clear how you fix this problem,” says Martin Vechev, a computer science professor at ETH Zurich in Switzerland who led the research. “This is very, very problematic.”
Vechev and his team found that the large language models that power advanced chatbots can accurately infer an alarming amount of personal information about users—including their race, location, occupation, and more—from conversations that appear innocuous.
Vechev says that scammers could use chatbots’ ability to guess sensitive information about a person to harvest sensitive data from unsuspecting users. He adds that the same underlying capability could portend a new era of advertising, in which companies use information gathered from chabots to build detailed profiles of users.
Some of the companies behind powerful chatbots also rely heavily on advertising for their profits. “They could already be doing it,” Vechev says.
The Zurich researchers tested language models developed by OpenAI, Google, Meta, and Anthropic. They say they alerted all of the companies to the problem. OpenAI, Google, and Meta did not immediately respond to a request for comment. Anthropic referred to its privacy policy, which states that it does not harvest or “sell” personal information.
“This certainly raises questions about how much information about ourselves we’re inadvertently leaking in situations where we might expect anonymity,” says Florian Tramèr, an assistant professor also at ETH Zurich who was not involved with the work but saw details presented at a conference last week.
Tramèr says it is unclear to him how much personal information could be inferred this way, but he speculates that language models may be a powerful aid for unearthing private information. “There are likely some clues that LLMs are particularly good at finding, and others where human intuition and priors are much better,” he says.
Source