I’ve noticed a rise in people sharing links to YouTube, Instagram, Twitter, TikTok, and reddit that include tracking parameters in the URL.
It might largely be harmless for now, but it’s not good to let companies build a web of links between users of this site, and to link the usernames of users on this site to their off-site accounts, which may include sensitive info.
SM | URL Part | Appearance in URL | Filtration technique |
---|---|---|---|
Youtube | Query | ?si=* | Remove query string |
Query | ?igshid=* | Remove query string | |
Query | ?t= | Remove query string | |
Tiktok | Subdomain and path | (vm/vt).tiktok.com/(random_string) | Block |
Path | /(sub_name)/s/(random_string) | Block |
This site should only allow canonical links to the content to limit the information exposed.
can someone explain what this means as they would to a small child? what is a canonical links.
Theres a url, say
peepee.com
. So far this is the routing portion of the url that says how to find the web server, basically saying “ask.com
how to findpeepee
”, and that gives us the ip address of the server.Everything that comes after that, is information for the server itself. So to navigate to a resource, say
poopoo
, that lives on the server, they would navigate topeepee.com/poopoo
.But sometimes you want to navigate to that resource and also communicate some bit of information to the server, say a login token so the server knows who is accessing that resource. This is communicated via a URL parameter, and looks like
?userid=abcd1234
, or in the full url:peepee.com/poopoo?userid=abcd1234
. So the user is still accessing the same resource, but has provided additional metadata to the server.These parameters can be abused to identify who knows who and who communicates with who by attaching a tracking id parameter to the URL, so when you share a link it includes that tracking parameter and anyone who clicks on it, well now the server knows that the originator of the tracking ID (well, the first person to be assigned it) shared it with this other person. This can be combined with other collected info to build a map and social graph of actual people, e.g. we know dave is at this ip, and jane is at this other ip, and we put a tracking parameter in daves url and we saw jane use that same tracking parameter in her url, so we know that dave shared this url with jane.
So to answer your question, a canonical link is a link to a resource without the unneeded url parameters.
gotcha thank you, i loved how you explained it!
There are often many ways to represent a webpage link in a URL format. For example, a random reddit post has several forms of links, even without any tracking:
https://www.reddit.com/r/me_irl/comments/18xheeg/me_irl/
https://redd.it/18xheeg
Both go to the same reddit post. However, if I were to use the new reddit redesign, or reddit mobile to share this link, it would look something like https://www.reddit.com/r/me_irl/s/stxMlEtK5H (not a real link). If you press on that, it might go to the more expanded form https://www.reddit.com/r/me_irl/comments/18xheeg/me_irl?share_id=5168327 but it will have a share_id parameter. Both clicking the link with the /s/stxMlEtK5H and landing on the page that has ?share_id=5168327 will register on reddit’s servers as some user following some other user’s link, and of course they know who both these users are. They can then correlate it, and form a graph (a structure that represents a network) that links these users because they interacted by sharing this link, even though they might have shared it on a second medium like Whatsapp, or Hexbear, and never interacted directly on reddit itself.
Canonical links are just the most normal links to the content. Without ?share_id stuff, and without pointless random letters. When Google finds reddit pages to show on their end they only show the full form, which is https://www.reddit.com/r/me_irl/comments/18xheeg/me_irl/. This is the canonical link form for reddit.
gotcha i see how this is a privacy issue now, thank you!