Discord says they "care a lot about privacy". Or do they really?

30th July 2023 from jc's blog

Citing the official privacy policy (with February 24, 2023 as the latest update), Discord mentions two points that I want to challenge here:

We care a lot about privacy.

and

We limit what information is required. We require the information that enables us to create your account, provide and maintain our services, meet our commitments to our users, and satisfy our legal requirements. The rest is optional.

How much of that is actually the case?

The data dump

To find out, I requested a GDPR data dump from Discord. This includes any messages you sent, a collection of account information, yet the biggest folder will likely be the activity folder. Judging by the official documentation, this folder is said to contain four folders. I only had two, reporting and tns (trust and safety), presumably because I opted out of any voluntary data collection. Let’s look into the files and see.

The folders, for me, contained one big events-2023-00000-of-00001.json file in both directories. In contrast to the name, these contains more than just data from this file. Additionally, the files in there are not properly sorted by time by default. The following shell pipeline will sort it for you:

jq -cr '"\(.timestamp[1:20] | strptime("%Y-%m-%dT%H:%M:%S"))\t\(.)"' < reporting/events-2023-00000-of-00001.json | sort -n | cut -f2 > sorted-reporting.json

Reported events

The reporting folder contains “data we use in order to operate our business (information such as messages sent, or your Nitro subscription, as an example.)”. Your expectation for this on a chat app would probably be: message contents, friend lists, guilds etc., but note that these are stored in other folders outside of the activity folder.

What does this contain? For me, these are the most commonly stored events:

and some other favourites:

As a developer I can understand some appeal for tracking this information to figure out how people use your app. But as mentioned above, I have revoked any consent to use my data to improve the app.

Event metadata

The following metadata is stored for each event. I have removed all of my metadata, I encourage you to download your own dump and explore it:

{
  "event_type": "guild_viewed",
  "event_id": "XXXXXX",
  "event_source": "client",
  "user_id": "XXXXXX",
  "domain": "Reporting",
  "freight_hostname": "analytics-ingest-prd-XXXXXX",
  "ip": "X.X.X.X",
  "day": "XXXX",
  "detected_locale": "en-US",
  "user_is_authenticated": true,
  "accessibility_support_enabled": false,
  "accessibility_features": "256",
  "browser": "XXXX",
  "device": "XXXX",
  "device_vendor_id": "XXXXX-XXX-XXX-XXX-XXXXXXX",
  "os": "XXXXX",
  "os_version": "XXX",
  "system_locale": "xx-XX",
  "client_build_number": "XXXXX",
  "release_channel": "stable",
  "client_version": "XXXXX",
  "client_performance_cpu": XXXXXXX,
  "client_performance_memory": "XXXXXX",
  "city": "XXXXXXXXXXXXXXX",
  "country_code": "XXXXX",
  "region_code": "XXXXX",
  "time_zone": "XXXXXXX",
  "isp": "XXXXXXXXXXXXXXXXXX",
  "cpu_core_count": "XXX",
  "channel_id": "XXXXXXXXXXXXXXX",
  "channel_type": "X",
  "channel_size_total": "X",
  "channel_member_perms": "XXXXXXXXXXXXXXX",
  "channel_hidden": false,
  "guild_id": "XXXXXXXXXXXX",
  "guild_size_total": "XXX",
  "guild_member_num_roles": "XX",
  "guild_member_perms": "XXXXXXXXX",
  "guild_num_channels": "XX",
  "guild_num_text_channels": "XX",
  "guild_num_voice_channels": "XX",
  "guild_num_roles": "XXX",
  "guild_is_vip": XXXX,
  "is_member": XXXX,
  "num_voice_channels_active": "XXX",
  "postable_channels": "XXXXX",
  "_source_job_id": "batch-streaming-ingest-v1-XXXXXXXX",
  "_ingest_ts": "XXXXXXXX",
  "rendered_locale": "en-US",
  "accepted_languages": [
    "XXXX"
  ],
  "accepted_languages_weighted": [
    "XXXXX"
  ],
  "primary_accepted_language": "xx-XX",
  "_processor_source": "primary",
  "viewing_all_channels": true,
  "client_heartbeat_session_id": "XXXXXXXXXXXXX",
  "design_id": "0",
  "_hour_pt": "XXXXXXXXXXXXXXXXX",
  "_hour_utc": "XXXXXXXXXXXXXXXXX",
  "_day_pt": "XXXXXXXXXXXXXXXXX",
  "_day_utc": "XXXXXXXXXXXXXXXXX",
  "client_send_timestamp": "\"XXXXXXXXXXXXXXXXX\"",
  "client_track_timestamp": "\"XXXXXXXXXXXXXXXXX\"",
  "timestamp": "\"XXXXXXXXXXXXXXXXX\""
}

So each time any of those events above arrive, Discord feels they need to store the city you’re currently in, your internet provider, device performance, device vendor ID (what I presume is a unique identifier for the device), and more. Why? Why does a chat application need to know where I am? And how can somebody, with a straight face, claim that this is only the required information?

Worse, all of this information is stored back to when you first created your account, even when you opted out of the data collection, presumably because, as Discord claims, they need this to operate their business.

I will leave it to you to decide whether the information above is needed to operate a chat application.

Benefit of the doubt

I asked two friends of mine to review their own data dump. One had an empty reporting folder. The other saw the exact same as I did.

Over a month ago now, I sent a polite email asking why this information is needed. I have received no response. My friend did the same, and received no response either.

Again: Discord claims that they “care a lot about privacy”. If they really did, how come they don’t respond to us, and track this information?

Other privacy issues

Deleting your Discord account does not delete your messages. It will remove your profile picture, set your name to something like “Deleted User XXXXXX”, and that’s it. As far as I’m aware they are not obligated under GDPR to delete your messages (only anonymize it), but depending on the personal information you have given on Discord, it wil be easy to associate your pseudonymous “deleted user” account with you.

Where I go from here

I used to actively work on new features for nostrum, a Discord API wrapper, which I have started maintaining together with my friend Joe a while ago. A lot of time went into maintenance of that library, and an even bigger amount of time went into implementing fancy features such as multi-node support and whatnot.

But over the course of exploring the data dump, I figured that we should build on top of protocols, not platforms. My next “API wrapper” will be built around the specification from an RFC, not some company’s REST API.

The following links talk about similar topics:

reply via email