The Great Data Divide (RIP Fragmynt)

I once googled "how tall are moose."¹ A reasonable question. I maintain this. I then spent the next month being aggressively marketed camouflage hunting gear and moose-themed home décor. There is apparently a market for moose-themed home décor. The algorithm now believes I am that market.

What's interesting about this (and by "interesting" I mean the kind of interesting that starts as funny and then slowly curdles into something that makes you want to lie down) is that the moose situation is the system working as designed. Nobody made a mistake. The ad targeting pipeline functioned correctly. I expressed intent via search query, the query was ingested by a data broker, the broker sold the signal to advertisers, and the advertisers served me relevant content, where "relevant" means "pertaining to moose" and not "something I would ever in my life purchase." The whole $600 billion digital advertising industry is, at a fundamental level, a machine for converting idle curiosity into décor recommendations.

So I started pulling the thread. The numbers are genuinely strange.

Individual personal data generates roughly $230 a year for advertisers.² The data broker industry (companies whose entire business model is compiling everything about you from search history, purchase records, social media, loyalty cards, location pings, and basically any digital exhaust you produce, then packaging it and selling it to advertisers, employers, insurance companies, and whoever else shows up with a purchase order) is a $252 billion market. Most people do not know this industry exists, which, if you think about it for even a second, is a remarkable achievement of marketing for an industry that is literally in the business of marketing.

And then there's the fraud problem.

Up to 40% of digital ad traffic is fraudulent. Bots. Click farms. Fake impressions. A business spending $100,000 on digital ads is potentially lighting $40,000 on fire, except that lighting money on fire would at least provide warmth. Mobile is worse. Some estimates put fraudulent mobile ad clicks at 80%, through techniques with names like "click injection" and "ad stacking" that sound like they were invented by a Bond villain who pivoted to adtech.³

The thing I kept coming back to (and this is where, if I'm being honest, the whole thing starts to look less like an industry with problems and more like a very large organism that nobody is steering) is that nobody's winning. Users get surveilled without compensation. Businesses get defrauded without recourse. The only consistent winners are the intermediaries who move the data around. A system that has managed to be exploitative AND inefficient at the same time. Which is honestly kind of an achievement.

The story that first got me thinking about any of this was from 2006. AOL released 650,000 users' search histories for research purposes. Replaced usernames with numbers, figured that was sufficient. Within days, The New York Times identified User 4417749 as Thelma Arnold, a 62-year-old widow from Lilburn, Georgia. Her searches told her whole story: "60 single men," "dog that urinates on everything," "hand tremors." Nobody hacked anything. It was just search data. That was twenty years ago. The amount of data we generate now makes Thelma Arnold's search history look like a sticky note on a refrigerator.

I kept circling the same question, which was: what if people owned their data and chose what to share? And because I am the kind of person who responds to interesting questions not by thinking about them at a reasonable distance but by quitting my job and building a company around them,⁴ that's what I did. I built Fragmynt. Ran it for about six months. The basic idea (ask permission, pay people, give businesses better data in return) is so straightforward that I kept expecting to find the obvious reason nobody had done it at scale. I think the vibe-coded website is still even active.

Nobody designed the current data economy. It accreted, the way coral reefs accrete, except instead of building something beautiful it built a system where a search for "hand tremors" can end up in a database that gets sold to an insurance company. Whether Fragmynt was the right answer, I honestly don't know. I think it was the right question. I also think that somewhere out there, the algorithm is still patiently waiting for me to buy a moose lamp.

Six feet at the shoulder. Seven if you include the head. They are, and I cannot stress this enough, much larger than you think they are. ↩
Which is to say: you, as a data-producing entity, are worth roughly the cost of a mid-tier pair of running shoes per year to the advertising industry, and you do not receive the shoes. ↩
I am not making up either of these terms. "Ad stacking" is the practice of layering multiple ads on top of each other in a single ad slot so that only the top one is visible but all of them register as "served." "Click injection" is when malware on a phone detects that an app is about to be installed and injects a fake click at the last millisecond to claim credit for the install. The whole thing has the energy of a heist movie, except the heist is happening to everyone all the time and the take is your attention. ↩
This is not, I want to be clear, a personality trait I'm recommending. It is, however, an accurate description of what happened. ↩

Footnotes