If you are interested in how this list is put together and then used in Firefox, this post is for you.
Safe Browsing lists
There are many possible ways to download URL lists to the browser and check against that list before loading anything. One of those is already implemented as part of our malware and phishing protection. It uses the Safe Browsing v2.2 protocol.
In a nutshell, the way that this works is that each URL on the block list is
SHA-256) and then that list of hashes is
downloaded by Firefox and stored into a data structure on disk:
This sbdbdump script can be used to extract the hashes contained in these files and will output something like this:
$ ~/sbdbdump/dump.py -v . - Reading sbstore: mozstd-track-digest256 [mozstd-track-digest256] magic 1231AF3B Version 3 NumAddChunk: 1 NumSubChunk: 0 NumAddPrefix: 0 NumSubPrefix: 0 NumAddComplete: 1696 NumSubComplete: 0 [mozstd-track-digest256] AddChunks: 1445465225 [mozstd-track-digest256] SubChunks: ... [mozstd-track-digest256] addComplete[chunk:1445465225] e48768b0ce59561e5bc141a52061dd45524e75b66cad7d59dd92e4307625bdc5 ... [mozstd-track-digest256] MD5: 81a8becb0903de19351427b24921a772
The name of the blocklist being dumped here (
mozstd-track-digest256) is set in the
urlclassifier.trackingTable preference which you can find in
about:config. The most important part of the output shown above is the
addComplete line which contains a hash that we will see again in a later section.
Once it's time to load a resource, Firefox hashes the URL, as well as a few variations of it, and then looks for it in the local lists.
If there's no match, then the load proceeds. If there's a match, then we do an additional check against a pairwise allowlist.
The pairwise allowlist (hardcoded in the
is designed to encode what we call "entity relationships". The list groups related domains together for
the purpose of checking whether a load is first or third party (e.g.
twimg.com both belong to the same entity).
Entries on this list (named
mozstd-trackwhite-digest256) look like this:
which translates to "if you're on the
twitter.com site, then don't block
If there's a match on the second list, we don't block the load. It's only when we get a match on the first list and not the second one that we go ahead and cancel the network load.
If you visit our test page, you will see tracking protection in action with a shield icon in the URL bar. Opening the developer tool console will expose the URL of the resource that was blocked:
The resource at "https://trackertest.org/tracker.js" was blocked because tracking protection is enabled.
Creating the lists
The Disconnect list is on their Github page, but the copy we use in Firefox is the copy we have in our own repository. Similarly the Disconnect entity list is from here but our copy is in our repository. Should you wish to be notified of any changes to the lists, you can simply subscribe to this Atom feed.
To convert this JSON-formatted list into the binary format needed by the Safe Browsing code, we run a custom list generation script whenever the list changes on GitHub.
If you run that script locally using the same configuration as our server stack, you can see the conversion from the original list to the binary hashes.
Here's a sample entry from the
[m] twimg.com >> twimg.com/ [canonicalized] twimg.com/ [hash] e48768b0ce59561e5bc141a52061dd45524e75b66cad7d59dd92e4307625bdc5
and one from
[entity] Twitter >> (canonicalized) twitter.com/?resource=twimg.com, hash a8e9e3456f46dbe49551c7da3860f64393d8f9d96f42b5ae86927722467577df
This in combination with the
sbdbdump script mentioned earlier, will allow you to
audit the contents of the local lists.
Serving the lists
The way that the binary lists are served to Firefox is through a custom server component written by Mozilla: shavar.
Every hour, Firefox requests updates from
shavar.services.mozilla.com. If new data is available, then the whole list is downloaded again. Otherwise, all it receives in return is an empty
To replicate how Firefox downloads the list, you can use this download script to ask the server for a copy of the full TP list:
$ ./download-list.py n:3600 i:mozstd-track-digest256 u:tracking-protection.cdn.mozilla.net/mozstd-track-digest256/1445465225
and then follow the URL redirection to get the actual list payload from the CDN:
$ wget https://tracking-protection.cdn.mozilla.net/mozstd-track-digest256/1445465225
Once you've downloaded that binary file, you can examine its content using this extractor script:
$ ./redirect-response-extractor.py 1445465225 Parsing a 54294-byte response file Processing control line... Add chunk 1445465225 contains 54272 bytes of 32-byte hashes Found 1696 prefixes in 54272 bytes
and dump all of the hashes it contains using the
$ ./redirect-response-extractor.py --verbose 1445465225 Parsing a 54294-byte response file Processing control line... Add chunk 1445465225 contains 54272 bytes of 32-byte hashes 35e032660edb921c0c0ce59bfa289dc5a84c71b99584b359d74d6b03d00de66f 532239bcc9edf7681023070798bee5ec5e4a6bc7c0bb68e1e8e9099e45fdff94 52c058e95fc8d0e51bb9dd4b72f1364aa471157475a8435daa71e8e1c9533615 ... e48768b0ce59561e5bc141a52061dd45524e75b66cad7d59dd92e4307625bdc5 ... 8a565d247c08ff7fd0950d8a1f37bf2da29eae4a0dd65126d87a0db7cab4b400 ca705fed923ab66d6d8bfe0f65359a4b872981be5bcc1364e29aac69375af323 7fc983ea552f7c8d153fc308d621eb4f52e84aa63ecccf3a735698a11a2a4a8d Found 1696 prefixes in 54272 bytes
which, as I have highlighted, contains the
twimg.com hash we have seen
Should you want to play with the server backend and run your own instance,
and then go into
about:config to change these preferences to point to your
Note that on Firefox 43 and later, these prefs have been renamed to:
Thanks to Tanvi Vyas for reviewing a draft of this post.