In 2010, Mark Zuckerberg told the audience at a TechCrunch awards ceremony that young people—especially social media users—no longer cared about privacy. “People have really gotten comfortable not only sharing more information and different kinds, but more openly and with more people,” he said. “That social norm is just something that has evolved over time.” While this statement obviously hasn’t aged well, it reflects a common belief that privacy violations happen when individuals reveal their own information. In other words, when something posted to Reddit or TikTok goes viral, or a nude photo sent to an admirer leaks, it’s first and foremost the fault of the person who posted it. This model of individualized accountability is very persistent. It’s also completely wrong. And it’s irrelevant in the age of generative AI.
Generative AI completely obliterates the idea of individual responsibility for privacy because you can’t control these algorithms’ access to your information, or what they do with it. Tools like ChatGPT, Dall-E, and Google Bard are trained on data scraped without consent, or even notice. At their worst, training sets suck up vast amounts of digital information and combine it into a data slurry that serves as the raw material for generative AI. As tech companies are scrambling to incorporate generative AI into every imaginable product, from search engines to games to military gadgets, it’s impossible to know where this output is going, or how it might be interpreted. Their privacy-violating predecessors, data brokers, also scraped the web and assembled massive dossiers on individuals, but their outputs aren’t available to the average person, for free, or integrated into search engines and word processors. The widespread availability of generative AI compounds potential privacy violations and opens up more people to harmful consequences.
The massive corpora employed by generative AI inevitably contain information on people that was not provided, created, or even known to be available. Public records about marriages, mortgages, and voter registration are all fair game, as are news stories, employee bios, and Wikipedia pages. But the slurry also contains millions of photographs and videos; Dall-E, for example, was trained on images collected from social media, search engines, and image-hosting sites. So if you’re in the background of a Flickr shot from 2007, your image could be used to train an algorithm. Nobody seems to know what goes into the data slurry, and there is no way to oversee or control it. When ChatGPT writes an inaccurate bio of me, I don’t know where the false information originated, but I don’t know where the correct information came from, either. We’re used to thinking of privacy as individual control over information, but it’s impossible to regulate how your personal information is used if you don’t even know its origin.
Anthropologists and legal scholars have known for years that privacy can’t be controlled by individuals, partly because we share information within networks. In other words, people talk about each other, both on and offline. There’s no easy way to put limits on that; you can ask your friends not to post pictures of your kids on Instagram or to mention you on TikTok, but you are only as private as your chattiest contact. Networked privacy violations often happen because information provided in an environment with particular norms and expectations moves elsewhere and is interpreted differently. TikToks made for queer, progressive audiences become fodder for anti-trans campaigns; political speeches made to sympathetic audiences seem outrageous when viewed by the opposition.
New technologies increasingly compromise this networked privacy. Forensic genealogy, for example, allows police to identify suspects by examining genetic evidence gathered from distant relatives. You can choose not to use Ancestry.com, but you can’t stop a third cousin—who you probably don’t even know exists—from doing the same. Big Data, which uses massive datasets in similar ways, frequently implicates friends, relatives, and even distant acquaintances, which becomes extraordinarily worrisome when integrated into predictive policing or risk assessment algorithms. There is nothing people can do to prevent such invasions of privacy.
Generative AI heightens these networked privacy concerns. It compromises our ability to do “privacy work,” the methods and strategies we all employ to retain an acceptable level of privacy. And the outputs of generative AI are completely detached from their original source in ways previously unimaginable. It’s one thing to leak private text messages, and another for the entirety of Reddit to be used as grist for robot poetry and bad college papers. Information provided in one context can be entirely recontextualized and remixed, changing its meaning and violating what the philosopher Helen Nissenbaum calls “contextual integrity.” How can any one person prevent this?
Source