Generally, I’m opposed to any telemetry in any project, open source or not. However, there was recently a discussion about adding telemetry to the Go programming language’s tools, and Go is my primary language. My initial reaction was outrage, I don’t want Google spyware on my system, but reading the proposal for how it was actually going to work changed my opinion.
Here’s a link to the proposal: https://research.swtch.com/telemetry. It was designed for Go, but the author believes it can be applied to other open source projects as well.
It was originally going to be opt-out, which I disagree with, but the Go team has listened to feedback and it will be opt-in instead. This is the first telemetry proposal I’m not completely opposed to, and I might even enable it on some of my devices.
While Google has had a very bad track record with spyware, this proposal actually seems reasonable and carefully designed to take privacy into consideration. The system will only collect numbers and stack traces. The numbers are statistics like the amount of times a Go tool has crashed or the amount of times a feature was used. Every week, with a 10% probability, a report will be sent. This amounts to an average of just 5 reports per year. The reports will contain no identifying information, not even a randomly-generated ID, they will be publicly viewable, and the decisions about what to collect will be made in an open, public process. All the code for this will be completely open source. It only applies to the Go tools themselves, not programs compiled with the Go compiler, and all the collection logic is local, with the metrics being stored in files that you can inspect to see what will be sent.
It seems like this proposal would preserve privacy while still providing only the necessary data to allow the Go team to improve their tools. What are your thoughts on this?
Yeah, opt-in is fine enough. I’d quite like a broad acceptance of this sort of standard to take root in these kinds of projects; it’s exhausting to have the same discussion, project-by-project, as they go through the pipeline from “we need to make this better” -> “we need data” -> “people don’t like it” -> “opt-out will appease them” -> “ok, ok, we’ll make it opt-in”.
While explaining the change to opt-in, at one point the author - rsc - says something like “opt-in increases the privacy risk to every installation, since there’s fewer installs to hide amongst, and we need data from each of them more frequently to build a sample”. But of course, that privacy risk is only taken on among the opted-in installations.
Another criticism he levels at opt-in is that it biases the sample towards those kinds of people who opt-in. Of course, opt-out biases the sample towards those kinds of people who don’t opt-out.
Kudos for actually avoiding any kind of identifier though, that’s often a fraught point with proponents moving from “we must be able to know the individual person and link it to all this other data” -> “pseudoanonymisation is fine” -> “ok, an actually anonymous id so we can track one install over time is fine”. Getting to “no id at all” is vanishingly rare IME, and again, it would be nice if this somehow became standard for this kind of project.
More broadly, teams can and should do without this kind of telemetry altogether. It’s not essential, it’s just cheaper for them to do this kind of naturalistic study than other options. That argument is never going to win, sadly.
(I still won’t be opting in to any of these projects though. I don’t even opt in to popcon)
Yeah, I usually don’t ever opt in either, but the reason I might in this case is
It’s useful for them that some people will, for sure ^^.
He speaks (very optimistically) of getting 10% opt-in rates. I think he’d be lucky to reach 1%, but even that’s more than sufficient for these kinds of studies to yield results.