How traceable are you? - Experiment results & analysis

I ran a privacy research experiment on browser fingerprinting. Let's take a look at the results and potential solutions.

How traceable are you? - Experiment results & analysis
Photo by George Prentzas / Unsplash

We're all familiar with incognito windows. When we want to search for something embarrassing, go to a specific streaming service, or add more privacy to our web browsing, many of us default to incognito windows for added anonymity. After all, the name has "incognito" in it, so it must keep our browsing habits secret, right? Not always.

The incognito window only prevents browsers from permanently storing website history and cookies, but it does not take any measures to conceal your true identity. For instance, Google Chrome, the most popular browser today, does not block ads, trackers, fingerprinters, or other creepy scripts that can be used to identify you.

Traditionally, trackers have used cookies to keep tabs on your browsing habits. Cookies are small pieces of text data that your browser stores and sends to each respective website. These cookies can contain authentication data, so you remain logged in, online shopping cart metadata, or user information, but they can also be utilized to uniquely identify you each time you revisit websites. Since browsers send cookies back to their original websites on each request, the website can identify you. However, there is one significant flaw in this approach. Incognito mode does not keep track of cookies or use existing cookies from normal browsing mode. Visiting a website in incognito mode would leave a cookie-based tracking website completely in the dark.

The New Kid: Browser Fingerprinting

Fear no more, advertisers. There is a new technique called browser fingerprinting that can solve all your problems. Are you tired of users evading your malicious tracking through incognito mode? Well, what if I told you that browser fingerprinting could track them there too? That's right. The user would even have a false sense of privacy in incognito mode when, in reality, we can still monitor them at all times.

Browser fingerprinting is a technique that creates a profile of the unique attributes of your browser and device. In incognito mode, your browser and device do not change, making it easy for websites and third-party trackers to cross-track you across browsing profiles.

There are two main types of fingerprinting: client-side and server-side.

Client-Side Fingerprinting

Client-side fingerprinting uses Javascript to detect browser and device attributes, such as screen resolution, engine, app version, etc. This can be harder to spoof.

Server-Side Fingerprinting

Server-side printing uses header data, such as a user agent, GPC, Dnt, etc. These attributes can be easily spoofed, but spoofing may cause some websites to break, which is why most users avoid this.

Experiment Recap

I ran a privacy research experiment that explored browser fingerprinting through the use of nojs server-side connection data, client-side fingerprinting techniques and a similarity algorithm.

Experiment Inner-workings

This project used Go for its API, SQLite for the database, HTML templates for the frontend and Javascript for client-side fingerprinting. I chose Go and SQLite because they're lightweight and performant (for this type of project at least) and HTML because it can be rendered by browsers that block Javascript for the nojs tests.

Server-Side Fingerprinting

On connection, the web server collected the following headers:

  • User-Agent
  • Accept
  • Accept-Language
  • Accept-Encoding
  • Upgrade-Insecure-Requests
  • Sec-Gpc
  • Sec-Fetch-User
  • Sec-Fetch-Site
  • Sec-Fetch-Mode
  • Sec-Fetch-Dest
  • Dnt (for all non-Firefox browsers)

These headers were then concatenated together and hashed to create a fingerprint identifier.

I specifically chose these headers because, in my limited testing, they were consistent in both normal and private browsing sessions. Other headers, like cookie, app-pragma, cache-control and referrer, were unreliable and varied unpredictably.

Client-Side Fingerprinting

After the page's initial load, a Javascript script was then loaded to collect the following data:

  • Installed fonts
  • Timezone
  • Timezone Spoof Status
  • Time Spoof Status
  • Screen Height
  • Screen Width
  • Screen Color Depth
  • Navigator App Version
  • navigator.brave

The timezone and time spoof statuses were fetched using a technique involving web workers. Since web workers are loaded differently than webpages, extensions, such as Windscribe's browser extension, cannot spoof the time in workers. If the time data retrieved in the web worker was different than that found in the webpage, it was safe to determine these properties were spoofed.

Extensions, like Windscribe, have a user agent spoofing function that can make the client appear like another browser. However, on Brave Browser, navigator.brave is accessible. Brave users could easily be identified through this navigator property even if the user agent was spoofed.

Experiment Results Analysis

1757 server-side fingerprints, 1699 client-side fingerprints and 5814 similarity tests were collected in version 6 at the time of writing.

In order to defeat fingerprinting, you must blend in with the crowd or randomize your client attributes frequently. Blending in with the crowd means looking as similar as possible to other visitors, making it much harder to individually track you. Let's explore this technique by analyzing the collected data.

Operating Systems

Name Count Percentage
Windows 404 22.99%
MacOS/iPadOS 156 8.88%
Linux 170 9.68%
iOS 89 5.07%
Android 737 41.95%
Other 201 11.44%
Pie chart displaying the different operating systems detected server-side

These results are pretty interesting. The most fingerprinted mobile operating system and operating system, in general, was Android. In contrast, the most fingerprinted desktop operating system was Windows. The most popular desktop operating system is Windows 10. The diverse set of operating systems makes this a good tool for fingerprinting, but it definitely requires other factors to go along with it.

Browsers

Name Count Percentage
Firefox 543 30.90%
Chrome 745 42.40%
Safari 119 6.77%
WebView 117 6.66%
Other 233 13.26%
Pie chart displaying the different browsers detected server-side

Chrome is the most popular browser type, both in the real world and in this experiment. Firefox came in second place, Other third, WebView fourth and Safari last.

Timezones

Timezone Count Percentage
UTC 323 19.01%
America/New_York 163 9.59%
Europe/Berlin 94 5.53%
America/Chicago 78 4.59%
Europe/London 71 4.18%
America/Los_Angeles 69 4.06%
America/Toronto 47 2.77%
Asia/Calcutta 45 2.65%
Europe/Amsterdam 43 2.53%
Europe/Paris 42 2.47%
Other 684 41.23%
Pie chart displaying the different timezones detected client-side

The timezones dataset was pretty diverse. The "Other" category is a majority in this dataset, which means that it can be a great device attribute to base fingerprints on for cross-tracking.

Excluding the "Other" category, UTC takes the largest portion of this pie chart. This is because of Firefox's resistFingerprinting feature. resistFingerprinting in Firefox attempts to make the browser look as bland as possible and, as a result, less uniquely identifiable. Setting the timezone to UTC is one of the defences.

Screen Resolutions

Resolution Count Percentage
1920x1080 234 14.13%
412x915 64 3.86%
390x844 52 3.14%
2560x1440 46 2.78%
375x667 44 2.66%
1440x900 43 2.60%
412x892 41 2.48%
1366x768 40 2.42%
375x812 37 2.23%
414x896 34 2.05%
Other 984 60.78%
Pie chart displaying the different screen resolutions detected client-side

Devices come in all shapes in sizes, and that is shown clearly by the 60% "Other" category. Like the timezones dataset, most users have either unique or almost unique screen resolutions, making it a good tool for fingerprinting.

Screen Color Depths

Depth Count Percentage
24 1358 82.00%
32 244 14.73%
30 53 3.20%
16 1 0.06%
Pie chart displaying the different screen color depths detected client-side

Screen color depth does not seem like a useful indicator for fingerprinting. 82% of visitors had a depth of 24, meaning most users' displays will have the same depth.

Experiment Efficacy

This experiment explored browser fingerprinting through the use of server-side connection data and other client-side techniques. But how well did it perform?

User Reviews

u/tosonana helped me promote my experiment around privacy related subreddits.

u/redonbills sharing their positive experience
u/shab-re sharing their mixed, but overall positive, experience
u/Golferhamster sharing their frustrating, but overall positive, experience
u/SirLumpyFrog sharing their negative experience

Let's break these down.

Why did it work for most people, but not others?

Browser fingerprinting is all about generating unique identities based on specific attributes. If you look too similar to other users, the technique fails.

Users who reported that the experiment worked on them had a more unique browser/device configuration combination. This could have been caused by extensions, such as anti-fingerprinting or tracking extensions or spoofing extensions (which the experiment could detect), time zones, fonts, or operating systems.

Users who reported that the experiment failed had more common configurations. For example, I mentioned Firefox's resistFingerprinting feature earlier, which helped make browsers appear like those of everyone else. This helped many people avoid detection by my experiment. However, some users reported the experiment being able to track them despite having resistFingerprinting enabled. This is most likely because they had an extension that made them stand out.

In addition to Firefox, Safari on the iPhone did an overall decent job of protecting users. Users using default Safari settings (meaning default Webkit experiment settings) on the same iOS version had the same fingerprint, reducing the accuracy of the experiment.

Potential Solutions

The goal of this experiment was to gain a better understanding of fingerprinting. Thanks to all the participants, I was able to achieve this. Based on my data analysis, I have compiled a short list of potential solutions that browser or extension developers can consider to create a more private Internet.

Randomize Headers

Headers were the main focus of the nojs portion of this experiment. As explained earlier, the experiment used specific, consistent request headers from browsers to create a fingerprint. Headers such as GPC, Dnt, and user agent can differ greatly across browser configurations, which is why they should be standardized or randomized.

One example of header standardization could be forcing the same GPC headers for every user. The same goes for the other headers collected.

Randomization can be achieved by unpredictably adding and removing specific headers, making it impossible to fingerprint users server-side using this method. For example, on one website, display GPC and Dnt headers, but on another, remove or fill those headers with random data.

Brave: Don't expose your users

Brave exposes users on the client-side by allowing access to a navigator.brave attribute, thereby bypassing user-agent spoofing and other browser spoofing techniques. As long as that navigator property is accessible, the website can safely assume that the user is indeed on Brave and not on their spoofed browser.

Special Thanks

  • u/tosonana - Promoting my experiment on privacy subreddits
  • Surveillance Report (The New Oil) - Promoting my experiment on their podcast
  • ente - Promoting my experiment on their Twitter
  • z0ccc - Providing inspiration
  • Everyone who participated - Contributing anonymized browser data for analysis