How traceable are you? - Experiment results & analysis
I ran a privacy research experiment on browser fingerprinting. Let's take a look at the results and potential solutions.
We're all familiar with incognito windows. When we want to search for something embarrassing, go to a specific streaming service, or add more privacy to our web browsing, many of us default to incognito windows for added anonymity. After all, the name has "incognito" in it, so it must keep our browsing habits secret, right? Not always.
The incognito window only prevents browsers from permanently storing website history and cookies, but it does not take any measures to conceal your true identity. For instance, Google Chrome, the most popular browser today, does not block ads, trackers, fingerprinters, or other creepy scripts that can be used to identify you.
Traditional Cookie-based Trackers
Traditionally, trackers have used cookies to keep tabs on your browsing habits. Cookies are small pieces of text data that your browser stores and sends to each respective website. These cookies can contain authentication data, so you remain logged in, online shopping cart metadata, or user information, but they can also be utilized to uniquely identify you each time you revisit websites. Since browsers send cookies back to their original websites on each request, the website can identify you. However, there is one significant flaw in this approach. Incognito mode does not keep track of cookies or use existing cookies from normal browsing mode. Visiting a website in incognito mode would leave a cookie-based tracking website completely in the dark.
The New Kid: Browser Fingerprinting
Fear no more, advertisers. There is a new technique called browser fingerprinting that can solve all your problems. Are you tired of users evading your malicious tracking through incognito mode? Well, what if I told you that browser fingerprinting could track them there too? That's right. The user would even have a false sense of privacy in incognito mode when, in reality, we can still monitor them at all times.
Browser fingerprinting is a technique that creates a profile of the unique attributes of your browser and device. In incognito mode, your browser and device do not change, making it easy for websites and third-party trackers to cross-track you across browsing profiles.
There are two main types of fingerprinting: client-side and server-side.
Server-side printing uses header data, such as a user agent, GPC, Dnt, etc. These attributes can be easily spoofed, but spoofing may cause some websites to break, which is why most users avoid this.
I ran a privacy research experiment that explored browser fingerprinting through the use of nojs server-side connection data, client-side fingerprinting techniques and a similarity algorithm.
On connection, the web server collected the following headers:
- Dnt (for all non-Firefox browsers)
These headers were then concatenated together and hashed to create a fingerprint identifier.
I specifically chose these headers because, in my limited testing, they were consistent in both normal and private browsing sessions. Other headers, like
referrer, were unreliable and varied unpredictably.
- Installed fonts
- Timezone Spoof Status
- Time Spoof Status
- Screen Height
- Screen Width
- Screen Color Depth
- Navigator App Version
The timezone and time spoof statuses were fetched using a technique involving web workers. Since web workers are loaded differently than webpages, extensions, such as Windscribe's browser extension, cannot spoof the time in workers. If the time data retrieved in the web worker was different than that found in the webpage, it was safe to determine these properties were spoofed.
Extensions, like Windscribe, have a user agent spoofing function that can make the client appear like another browser. However, on Brave Browser, navigator.brave is accessible. Brave users could easily be identified through this navigator property even if the user agent was spoofed.
Experiment Results Analysis
1757 server-side fingerprints, 1699 client-side fingerprints and 5814 similarity tests were collected in version 6 at the time of writing.
In order to defeat fingerprinting, you must blend in with the crowd or randomize your client attributes frequently. Blending in with the crowd means looking as similar as possible to other visitors, making it much harder to individually track you. Let's explore this technique by analyzing the collected data.
These results are pretty interesting. The most fingerprinted mobile operating system and operating system, in general, was Android. In contrast, the most fingerprinted desktop operating system was Windows. The most popular desktop operating system is Windows 10. The diverse set of operating systems makes this a good tool for fingerprinting, but it definitely requires other factors to go along with it.
Chrome is the most popular browser type, both in the real world and in this experiment. Firefox came in second place, Other third, WebView fourth and Safari last.
The timezones dataset was pretty diverse. The "Other" category is a majority in this dataset, which means that it can be a great device attribute to base fingerprints on for cross-tracking.
Excluding the "Other" category, UTC takes the largest portion of this pie chart. This is because of Firefox's
resistFingerprinting in Firefox attempts to make the browser look as bland as possible and, as a result, less uniquely identifiable. Setting the timezone to UTC is one of the defences.
Devices come in all shapes in sizes, and that is shown clearly by the 60% "Other" category. Like the timezones dataset, most users have either unique or almost unique screen resolutions, making it a good tool for fingerprinting.
Screen Color Depths
Screen color depth does not seem like a useful indicator for fingerprinting. 82% of visitors had a depth of 24, meaning most users' displays will have the same depth.
This experiment explored browser fingerprinting through the use of server-side connection data and other client-side techniques. But how well did it perform?
u/tosonana helped me promote my experiment around privacy related subreddits.
Let's break these down.
Why did it work for most people, but not others?
Browser fingerprinting is all about generating unique identities based on specific attributes. If you look too similar to other users, the technique fails.
Users who reported that the experiment worked on them had a more unique browser/device configuration combination. This could have been caused by extensions, such as anti-fingerprinting or tracking extensions or spoofing extensions (which the experiment could detect), time zones, fonts, or operating systems.
Users who reported that the experiment failed had more common configurations. For example, I mentioned Firefox's resistFingerprinting feature earlier, which helped make browsers appear like those of everyone else. This helped many people avoid detection by my experiment. However, some users reported the experiment being able to track them despite having resistFingerprinting enabled. This is most likely because they had an extension that made them stand out.
In addition to Firefox, Safari on the iPhone did an overall decent job of protecting users. Users using default Safari settings (meaning default Webkit experiment settings) on the same iOS version had the same fingerprint, reducing the accuracy of the experiment.
The goal of this experiment was to gain a better understanding of fingerprinting. Thanks to all the participants, I was able to achieve this. Based on my data analysis, I have compiled a short list of potential solutions that browser or extension developers can consider to create a more private Internet.
Headers were the main focus of the nojs portion of this experiment. As explained earlier, the experiment used specific, consistent request headers from browsers to create a fingerprint. Headers such as GPC, Dnt, and user agent can differ greatly across browser configurations, which is why they should be standardized or randomized.
One example of header standardization could be forcing the same GPC headers for every user. The same goes for the other headers collected.
Randomization can be achieved by unpredictably adding and removing specific headers, making it impossible to fingerprint users server-side using this method. For example, on one website, display GPC and Dnt headers, but on another, remove or fill those headers with random data.
Brave: Don't expose your users
Brave exposes users on the client-side by allowing access to a
navigator.brave attribute, thereby bypassing user-agent spoofing and other browser spoofing techniques. As long as that navigator property is accessible, the website can safely assume that the user is indeed on Brave and not on their spoofed browser.
- u/tosonana - Promoting my experiment on privacy subreddits
- Surveillance Report (The New Oil) - Promoting my experiment on their podcast
- ente - Promoting my experiment on their Twitter
- z0ccc - Providing inspiration
- Everyone who participated - Contributing anonymized browser data for analysis