How traceable are you? - Experiment results & analysis

I ran a privacy research experiment on browser fingerprinting. Let's take a look at the results and potential solutions.

How traceable are you? - Experiment results & analysis
Photo by George Prentzas / Unsplash

We're all familiar with incognito windows. When we want to search for something embarrassing, go to a specific streaming service or want to add more privacy to our web browsing, many of us default to incognito windows for added anonymity. I mean, hey, the name has "incognito" in it, so it must mean it keeps your browsing habits secret, right? Well, not always.

The incognito window has only ever stopped browsers from permanently storing website history and cookies, but it doesn't take any steps to mask your true identity. For example, Google Chrome, the most popular browser today, doesn't block ads, trackers, fingerprinters or other creepy scripts that may be used to identify you.

Traditionally, trackers have used cookies to keep tabs on your browsing habits. Cookies are small pieces of text data that your browser stores and sends to each respective website. These cookies can contain authentication data, so you stay logged in, online shopping cart metadata or user information, but they can also be used to uniquely identify you every time you revisit websites. Since browsers send cookies back to their original websites on each request, the website can know who you are. However, there's one huge flaw in this method. Incognito mode doesn't keep track of cookies or use existing cookies from normal browsing mode. Visiting a website in incognito would leave a cookie-based tracking website completely in the dark.

The New Kid: Browser Fingerprinting

Fear no more advertisers, though; there's a new technique called browser fingerprinting that can solve all of your problems! Are you tired of users evading your malicious tracking through incognito mode? Well, what if I told you that browser fingerprinting would track them there too? That's right. The user would even feel a false sense of privacy in incognito when in reality, we can still keep our eyes on them at all times.

Browser fingerprinting is a technique that compiles a profile of the unique attributes of your browser and device. In incognito mode, your browser and device don't change, so it can be trivial for websites and third-party trackers to cross-track you across the browsing profiles.

There are two main types of fingerprinting, client-side and server-side.

Client-Side Fingerprinting

Client-side fingerprinting uses Javascript to detect browser and device attributes, such as screen resolution, engine, app version, etc. This can be harder to spoof.

Server-Side Fingerprinting

Server-side printing uses header data, such as a user agent, GPC, Dnt, etc. These attributes can be easily spoofed, but spoofing may cause some websites to break, which is why most users avoid this.

Experiment Recap

I ran a privacy research experiment that explored browser fingerprinting through the use of nojs server-side connection data, client-side fingerprinting techniques and a similarity algorithm.

Experiment Inner-workings

This project used Go for its API, SQLite for the database, HTML templates for the frontend and Javascript for client-side fingerprinting. I chose Go and SQLite because they're lightweight and performant (for this type of project at least) and HTML because it can be rendered by browsers that block Javascript for the nojs tests.

Server-Side Fingerprinting

On connection, the web server collected the following headers:

  • User-Agent
  • Accept
  • Accept-Language
  • Accept-Encoding
  • Upgrade-Insecure-Requests
  • Sec-Gpc
  • Sec-Fetch-User
  • Sec-Fetch-Site
  • Sec-Fetch-Mode
  • Sec-Fetch-Dest
  • Dnt (for all non-Firefox browsers)

These headers were then concatenated together and hashed to create a fingerprint identifier.

I specifically chose these headers because, in my limited testing, they were consistent in both normal and private browsing sessions. Other headers, like cookie, app-pragma, cache-control and referrer, were unreliable and varied unpredictably.

Client-Side Fingerprinting

After the page's initial load, a Javascript script was then loaded to collect the following data:

  • Installed fonts
  • Timezone
  • Timezone Spoof Status
  • Time Spoof Status
  • Screen Height
  • Screen Width
  • Screen Color Depth
  • Navigator App Version
  • navigator.brave

The timezone and time spoof statuses were fetched using a technique involving web workers. Since web workers are loaded differently than webpages, extensions, such as Windscribe's browser extension, cannot spoof the time in workers. If the time data retrieved in the web worker was different than that found in the webpage, it was safe to determine these properties were spoofed.

Extensions, like Windscribe, have a user agent spoofing function that can make the client appear like another browser. However, on Brave Browser, navigator.brave is accessible. Brave users could easily be identified through this navigator property even if the user agent was spoofed.

Experiment Results Analysis

1757 server-side fingerprints, 1699 client-side fingerprints and 5814 similarity tests were collected in version 6 at the time of writing.

In order to defeat fingerprinting, you must blend in with the crowd or randomize your client attributes frequently. Blending in with the crowd means looking as similar as possible to other visitors, making it much harder to individually track you. Let's explore this technique by analyzing the collected data.

Operating Systems

Name Count Percentage
Windows 404 22.99%
MacOS/iPadOS 156 8.88%
Linux 170 9.68%
iOS 89 5.07%
Android 737 41.95%
Other 201 11.44%
Pie chart displaying the different operating systems detected server-side

These results are pretty interesting. The most fingerprinted mobile operating system and operating system, in general, was Android. In contrast, the most fingerprinted desktop operating system was Windows. The most popular desktop operating system is Windows 10. The diverse set of operating systems makes this a good tool for fingerprinting, but it definitely requires other factors to go along with it.

Browsers

Name Count Percentage
Firefox 543 30.90%
Chrome 745 42.40%
Safari 119 6.77%
WebView 117 6.66%
Other 233 13.26%
Pie chart displaying the different browsers detected server-side

Chrome is the most popular browser type, both in the real world and in this experiment. Firefox came in second place, Other third, WebView fourth and Safari last.

Timezones

Timezone Count Percentage
UTC 323 19.01%
America/New_York 163 9.59%
Europe/Berlin 94 5.53%
America/Chicago 78 4.59%
Europe/London 71 4.18%
America/Los_Angeles 69 4.06%
America/Toronto 47 2.77%
Asia/Calcutta 45 2.65%
Europe/Amsterdam 43 2.53%
Europe/Paris 42 2.47%
Other 684 41.23%
Pie chart displaying the different timezones detected client-side

The timezones dataset was pretty diverse. The "Other" category is a majority in this dataset, which means that it can be a great device attribute to base fingerprints on for cross-tracking.

Excluding the "Other" category, UTC takes the largest portion of this pie chart. This is because of Firefox's resistFingerprinting feature. resistFingerprinting in Firefox attempts to make the browser look as bland as possible and, as a result, less uniquely identifiable. Setting the timezone to UTC is one of the defences.

Screen Resolutions

Resolution Count Percentage
1920x1080 234 14.13%
412x915 64 3.86%
390x844 52 3.14%
2560x1440 46 2.78%
375x667 44 2.66%
1440x900 43 2.60%
412x892 41 2.48%
1366x768 40 2.42%
375x812 37 2.23%
414x896 34 2.05%
Other 984 60.78%
Pie chart displaying the different screen resolutions detected client-side

Devices come in all shapes in sizes, and that is shown clearly by the 60% "Other" category. Like the timezones dataset, most users have either unique or almost unique screen resolutions, making it a good tool for fingerprinting.

Screen Color Depths

Depth Count Percentage
24 1358 82.00%
32 244 14.73%
30 53 3.20%
16 1 0.06%
Pie chart displaying the different screen color depths detected client-side

Screen color depth does not seem like a useful indicator for fingerprinting. 82% of visitors had a depth of 24, meaning most users' displays will have the same depth.

Experiment Efficacy

This experiment explored browser fingerprinting through the use of server-side connection data and other client-side techniques. But how well did it perform?

User Reviews

u/tosonana helped me promote my experiment around privacy related subreddits.

u/redonbills sharing their positive experience
u/shab-re sharing their mixed, but overall positive, experience
u/Golferhamster sharing their frustrating, but overall positive, experience
u/SirLumpyFrog sharing their negative experience

Let's break these down.

Why did it work for most people, but not others?

Browser fingerprinting is all about computing identities based on unique attributes. If you look too similar to other users, then it simply fails.

Users who reported the experiment worked on them had a more unique browser/device configuration combination. This could have been caused by extensions, like anti-fingerprinting or tracking extensions or spoofing extensions (which the experiment could detect), timezones, fonts or operating systems.

Users who reported the experiment failed had more common configurations. For example, I talked about Firefox's resistFingerprinting feature above, which helped make browsers look like everyone else's. This helped quite a few people evade detection by my experiment. Some users reported the experiment being able to track them despite resistFingerprinting being enabled. However, this is most likely because they had an extension making them stand out.

In addition to Firefox, Safari on iPhone did a pretty good job overall at protecting users. Users using default Safari settings (meaning default Webkit experiment settings) on the same iOS version saw the same fingerprint, reducing the accuracy of the experiment.

Potential Solutions

The end goal of this experiment was to get a greater understanding of fingerprinting. Thanks to all of the participants, I was able to achieve this. As a result of my data analysis, I have compiled a small list of potential fixes that browser or extension developers can take into account to, hopefully, build a more private Internet.

Randomize Headers

Headers were the main focus of this experiment's nojs portion. As explained above, the experiment took specific, consistent request headers from browsers and used those values to create a fingerprint. Headers, like GPC, Dnt and user agent, can vary greatly across browser configurations, which is why they should be standardized or randomized.

An example of standardization of these headers could be forcing the same GPC headers for every user. The same goes for the rest of the headers collected.

Randomization can be done by unpredictably adding and removing specific headers, making it impossible to server-side fingerprint users through this method. For example, on one website, show GPC headers and Dnt headers, but on another, strip those headers out or fill them with garbage.

Brave: Don't expose your users

Brave exposes users on the client-side by allowing access to a navigator.brave attribute, thereby bypassing user-agent spoofing and other browser spoofing techniques. As long as that navigator property is accessible, the website can safely assume that the user is indeed on Brave and not on their spoofed browser.

Special Thanks

  • u/tosonana - Promoting my experiment on privacy subreddits
  • Surveillance Report (The New Oil) - Promoting my experiment on their podcast
  • ente - Promoting my experiment on their Twitter
  • z0ccc - Providing inspiration
  • Everyone who participated - Contributing anonymized browser data for analysis