lederhosen | A thought on privacy

(NB: I'm not an expert in either privacy or web technology, so this may be unworkable rubbish, but I'll throw it out here anyway.)

I've been following the recent discussion about Google and privacy - in particular, privacy related to general web browsing - and my impression of the discussion is:

- most of the advice on privacy protection relates to ways to prevent Google (and others!) from capturing your browsing habits
- Google put a lot of effort into capturing your browsing habits
- ergo, protecting your privacy this way is pretty inconvenient.

I wonder if there's anything to be gained by turning the problem on its head: instead of trying to stop Google et al. from capturing info on your browsing habits, make that data capture worthless. Something like this:

When you want to browse the web, instead of opening Firefox/Chrome/etc. directly, you do so through an app - let's call it "Chaff" - that lets you call these browsers and enter data into them.

But when you ask Chaff to open $BROWSER, it opens multiple instances of that browser - let's say five. One of these is visible to you, and you interact with it through Chaff (ideally the interface would be transparent enough that the experience is just like using the vanilla browser). The other four run in the background, usually hidden from view.

When you enter/click on a URL in the visible instance, Chaff automatically generates a similar request in each of the hidden instances - but for a different, randomly-chosen website.

For example, suppose the site you REALLY want to see is www.smashthegovernment.org. When you type this request into your browser (via Chaff), the visible instance pulls up that site. It goes on your browser history, and you get whatever tracking cookies are associated with it.

But meanwhile, in four other browser instances, Chaff is also opening www.senioradministrativenurses.xxx , www.betterhomesandgardens.com, www.bankofamerica.com, and www.redcross.org. All of those go into the browser/cookie history too. Whenever you click on a link within smashthegovernment.org, Chaff automatically "clicks" on a link in each of the other instances.

The effect of this is that while your browsing history is logged, the log also contains a great deal of bogus information. A human or bot looking at that history can't easily tell whether you're an anti-government activist, a nurse fetishist, or just a gardener.

Obviously, if the 'chaff' sites are randomly selected every time you reopen the browser, this isn't going to work: if your browser history shows that you visit smashthegovernment.org regularly, but the other sites are different every day, that's easy to interpret. So you want some way of making these "twins" persistent.

One way to achieve that might be to work off a list of websites, classified by type and sorted by size. For instance, suppose the list is classified into "government", "porn", "education", "commercial", and "miscellaneous"; if you request the n'th largest "government" site, then Chaff would match that to the n'th largest porn, .edu, .com, and misc sites on its list.

Navigation within a website: any time you open a sub-page etc, the hidden instances should do something similar, e.g. "randomly" select a link from the currently-open page. (This "random" selection needs to be reproducible, for the same sort of reasons as above.)

Sites not on the list: need some mechanism for adding these, but that shouldn't be a large issue.

Website 'death': can't just replace with a new site, probably need to set the number of "twin" sites large enough that you can afford to lose a few.

Security: anything that involves going to random websites without user oversight obviously requires attention to general security settings.

Entering form information (passwords etc): maybe generate bogus form submissions to the other sites? This one may be difficult.

I suspect this approach is unlikely to hold up against a targeted investigation by humans; a human investigator could still pick out patterns that couldn't easily be faked. For instance, the hidden instances are probably not going to be viewing anything that requires getting past a captcha.

But it might be able to provide some protection against automated data mining. Thoughts, nitpicks, glaring holes?

A thought on privacy

no subject