Mozilla research: Browsing histories are unique enough to reliably identify users.

Steffie

From https://www.zdnet.com/article/mozilla-research-browsing-histories-are-unique-enough-to-reliably-identify-users/

A recently published study conducted by three Mozilla employees has looked at the privacy provided by browsing histories.

Their findings show that most users have unique web browsing habits that allow online advertisers to create accurate profiles.

These profiles can then be used to track and re-identify users across different sets of user data that contain even small samples of a user's browsing history.

Effectively, the study comes to dispel an online myth that browsing history, even the anonymized one, isn't useful for online advertisers. In reality, the study shows that even a small list of 50 to 150 of the user's favorite and most accessed domains can let advertisers create a unique tracking profile.

I continue to roll my eyes at the people who remain naively or even worse wilfully blase about ads, trackers et al... & now add [boom tish] this to the list.

sgunhouse

For people who might not understand ... you probably have several websites you visit regularly. This forum (hopefully), social media, email providers, news, weather, etc. Given enough detail about the sites you visit frequently and how often, they can come up with a profile. They may not know your real name and address, but they can still distinguish you from anyone else. Even if you buy a new device or change providers, will your browsing habits change? Not really. So they'll know you're the same person, even though the machine ID or IP address changed

iAN CooG

Ok, then what. What you can do about it? Nothing. Just disconnect from the internet forever and go live in a cave. Or simply don't give a damn and use adblockers

Pathduck

I got interested in looking into the methods they used to obtain this history data, turns out they used/abused the CSS :visited property along with the JS getComputedStyle() method:

"For the analysis presented in [48] they used the CSS :visited browser vulnerability [8] to determine whether various home pages were in a user’s browsing history. That is, they probed users’ browsers for 6,000 predefined "primary links" such as www.google.com and got a yes/no for whether that home page was in the user’s browsing history."

[8] https://bugzilla.mozilla.org/show_bug.cgi?id=147777

This bug/exploit has later been fixed, I assume for all major browsers.
Test case: https://bugzilla.mozilla.org/show_bug.cgi?id=147777#c11

I think one of the easiest and most effective ways of ensuring a bit more privacy is just blocking third-party cookies, in addition to automatically clearing or blocking them from sites you don't need to login to. Chromium is slowly phasing out support for third-party cookies, other browsers already block them by default.
https://blog.chromium.org/2020/01/building-more-private-web-path-towards.html

At first you'd think that this is Google doing a "Good Thing". But reading between the lines, they're clearly aware that clients today actually have too much control over cookies, and instead they're implementing APIs to allow direct access to user information by those deemed "worthy" to access it:

"Google Chrome proposes to store individual user-level information in the browser, letting outside ad tech companies do an API call to the Privacy Sandbox in order to receive personalization and measurement data without user-level information."
https://www.adexchanger.com/online-advertising/google-chrome-will-drop-third-party-cookies-in-2-years/

In the end, advertisers and their ad-pushing cronies will always try to figure out new ways to collect as much information as possible about users, and it's up to us to be conscious of this and take the necessary steps to avoid such data collection, depending on our personal need for privacy. This will continue to be an endless to-and-fro battle.

Problem of course, is that most casual web users have very little idea of what goes on, and like @JohnConnorBear pointed out, traffic is moving more and more to smartphones and apps, where the user has very little control (if at all) over what gets collected.

Catweazle

@JohnConnorBear , to test it I activated Google secure browsing and checked it in the tests links. It does absolutely nothing, shows the pages as is and in the second link I read something about an outdated protection.
I was surprised that Google is now reliable in protecting us from malware, it doesn't even get it on its own Android OS with Play Protect.

greybeard

@Pathduck said in Mozilla research: Browsing histories are unique enough to reliably identify users.:

Problem of course, is that most casual web users have very little idea of what goes on, and like @JohnConnorBear pointed out, traffic is moving more and more to smartphones and apps, where the user has very little control (if at all) over what gets collected.

One day I saw what looked like a useful app for my tablet. Before downloading and installing I read the ToS.
This app was just spyware.
In the ToS it said the the app's developers had the right to monitor the device for URLs, added apps, and just about anything else to a period up to 18 months after deletion.
I didn't bother downloading but wish I remembered what it was so I could warn others. Doh!

Pathduck

@greybeard I never install apps on my aging S4 unless I absolutely need it. People try to push apps at me all the time, like in the cinema they're like "do you have our app?" and I'm like "nope, not interested!"

I even waited ages to install the local public transport ticketing app, didn't see a need for it. I paid in cash or used an electronic refill card, worked for me. But I got ridiculed by friends for my old-school ways ("what are you, 70 years old?"), I was forced to cave in. It's practical though, but now I have another app having my VISA card number saved... Wonder how their security is...

In the end though, most apps will probably stop being supported on S4 (it's 7 years old now) and I'll be forced to upgrade. Maybe I'll just get the same one used. Or get one of those Nokias with only phone, sms, and maybe 3G for basic email

luetage

@Pathduck If you wanna forego an Android with all the bells and whistles, I’d go for a pinephone. They’re cheap and they run Linux.

Pathduck

@luetage

Headphone Jack

Sold!

It looks good, but my relationship with smartphones (basically I don't care about them) means I can't be arsed to spend hours fiddling about with all kinds of stuff to make them work. And being Linux, often the assumption is you're expected to like fiddling

BoneTone

@Steffie said in Mozilla research: Browsing histories are unique enough to reliably identify users.:

These profiles can then be used to track and re-identify users across different sets of user data that contain even small samples of a user's browsing history.

This is something they've actually been doing for some time now. If they can get an identifier on you, but it doesn't match any tracking profile in that site's database, they cross reference it with other tracking profile databases. When there's an intersection, they unify all the profiles. Tracking companies will share data to merge profiles and achieve greater results. That's impossible to block.

So-called super cookies make tracking you even easier. It's not something you can delete, it's never stored on your machine, it gets injected into net requests later, by ISPs generally. You may give one site virtually no data, but when they are able to match against a profile created from another site with which you shared more information, it all gets merged together.

The only way to prevent tracking from super cookies is to never send a request to the server in the first place. This is where tools like uMatrix excel, by blocking outbound traffic from occurring. If you don't connect to the server at all, there's no request for a super cookie to be injected into.

Default deny may be a bit less convenient, but it's the most effective method for reducing your tracking exposure. Block everything but the document when visiting a new site, then only enable the bare minimum to restore your desired level of functionality on the site.

If I'm just reading an article, I don't care if the layout isn't pretty, or if it's not in the designer's preferred typeface -- I replace most site's font choices with my own anyways. Most of the time I don't even care that the images don't load, there are few articles for which images are necessary.

So I run both uMatrix & uBlock Origin in default deny, allow exceptionally configurations. I use Stylus to make the page "pretty" with my own stylesheets, which overwrite parts of most site's styles even if I allow them.

@sgunhouse said in Mozilla research: Browsing histories are unique enough to reliably identify users.:

They may not know your real name and address, but they can still distinguish you from anyone else.

And they only need to capture your real name from one site. Then, whenever they can uniquely match you to that combined profile they've built, they've got your real name as well. Hence, you just cannot allow connections to tracking servers at all. If you load an image from a tracking server, they then match that to the profile using the super cookie, and they know what site & which pages you're accessing, all matched to all the data they've ever collected across all sites.

The file you access tells them what webpage your are visiting, the super cookie tells them who you are.

BoneTone

@iAN-CooG said in Mozilla research: Browsing histories are unique enough to reliably identify users.:

Ok, then what. What you can do about it?

Before deciding what to do, it's important to build at least a rudimentary threat model. Understand what threats exist for you, what the cost of them happening is, and therefore which risks are worth mitigating.

Protecting my browsing habits from law enforcement isn't a goal, that's not a risk worth mitigating, as it's not a real threat. Protecting sensitive data from certain corporate & malicious actors is a threat that is worth mitigating.

The idea that any level of exposure means there is no privacy is nonsensical. The question is what needs to remain private and from whom does it need to be kept. Of the threats that exist, which are worth mitigating. If law enforcement or nation states are among your threats, you won't be using a common web browser & smart phone to conduct your business.

If your concerns include things like identity theft and ransomware, then there are reasonable steps to take that can mitigate those threats. Running tools like uBlock Origin & uMatrix in default deny, allow exceptionally configs are a great stance to take to protect your privacy from those threats that matter most. Never send financial data without encryption. Use a different browser for accessing your bank accounts than your daily driver.

It is common to see folks on the net throw up their hands, scream bloody murder, and claim there is no privacy, nothing is safe. They won't be satisfied until something is 100% safe, so they'll never be satisfied. It's not called risk avoidance, it's risk management. When you build threat models, there will be risks you identify for which you do nothing to mitigate them. Doing so is too costly and the threat, though it exists, isn't major. This kind of nuance & measured thinking doesn't fit the worldviews of sensationalists & doomsayers.

Steffie

@BoneTone said in Mozilla research: Browsing histories are unique enough to reliably identify users.:

The only way to prevent tracking from super cookies is to never send a request to the server in the first place. This is where tools like uMatrix excel, by blocking outbound traffic from occurring. If you don't connect to the server at all, there's no request for a super cookie to be injected into.

Yup. As we've recently canvassed elsewhere, i do indeed use uM. As every new day passes, each new scandalous revelation drops, the reasons justifying it simply keep growing.

Don't leave home without it.
[Heehee KM].

BoneTone

@Steffie said in Mozilla research: Browsing histories are unique enough to reliably identify users.:

Don't leave [your] home[page] without it.

FTFY

Steffie

@BoneTone Oh i decided years ago that my least-risk paradigm, was only to visit my home page [= speeddial], no other sites, & do so only when the modem's off. For extra safety i do it under my bed, with the lights off.

Catweazle

Steffie

@Catweazle Oh no nooooooooo no -- your hi-tech solution is still far too risky for me... i'd have to air-gap it via scissors...

Catweazle

@JohnConnorBear , it is certainly an illusion to think that there is absolute privacy on the internet, but if it is possible to alleviate the consequences (starting with the user's own behavior)
It is not the main problem that Google and others track my data and habits ('to improve the user experience'), but more the direct implications of this, if I use certain services. I'm talking about manipulation.
Let's see, I don't care too much that YT offers me music and videos that correspond to my tastes, but if it matters, and a lot, that the search engine also presents me with the results that he considers relevant and not those that really are (Filter Bubble), with this is useless in its functions as much as asking someone who always gives you the reason , even if you don't have it (the earth is flat, in the first place of the results).
This and similar manipulations, also in the political sphere, are the real danger, going with 'I have nothing to hide' on the Internet, using these services for convenience.
This is why I try as much as possible to avoid these services and use everything I have at hand to avoid excessive curiosity on their part. At least I want to have a curtain on my bedroom window.

greybeard

@Pathduck Well there were a few apps I needed and a few I did not.
I do review the ToSs, they all do some tracking of some kind, but my Location (when I seems to be some 300 Km off. I don' mind.
There are a lot of apps that are just Bloatware (Facebook, Messenger, Instagram to name a few). Can use the websites just fine. Many others are the same.
I never do banking on it even though it runs on my network.
It is a tablet, not a phone. I have an old S3, not recommended for phone or email or almost anything else. It stores music via direct connection. Always have the Wifi - OFF, Bluetooth - OFF.
I am told there is a Linux based OS you can put on it but no-one knows the developer... and people much smarter than I tell me to stay away from it. There are others tho and this one.

Catweazle

@JohnConnorBear , with this you don't tell me anything new, so I said that privacy on the network does not exist and we can only protect rudimentaryly, I can disable Windows telemetries (with the side effect that works twice as fast), I can avoid using products from large companies where is known they track the user, I can use instead of this products where I know not to do this, at least not directly, but there will always be the metadata of my activities and for this reason it is important to have some discretion with sensitive data on the network, therefore it is the user himself as the last filter.
But as he said before, the real danger in not avoiding excessive tracking is not only the lack of priivability, but that certain companies use this information for user manipulation, offering content that they consider relevant, whether in the field of consumption, such as, and worse, in the political sphere, instead relevant information what I need really.
Adding also that sensitive information, which some company sells to third parties is not only a lack of privacy, it can even be a great security risk.
We can read the TOS and PP of the company whose service we use, but not that of this they buys this data from this company.

Catweazle

@JohnConnorBear , that Android is Googlespyware I have assumed, for this I do not use the mobile for relevant things, sometimes I post in this forum, call and little else. I do not have sensitive data or applications on my mobile, Vivaldi, Blokada, F-Droid, BitDefender, and some FOSS games.