FUNDAMENTALS

1.4 Anonymized data that turns back into personal data

If someone is processing personal data, data protection law applies. The opposite is also the case: if no personal data is processed, data protection law and its legal requirements do not apply. Anonymously collected Data and anonymized data is not (any longer) personal data if the reference to the individual is effectively and permanently removed. So is anonymization the easiest way to free oneself from legal obligations?

In principle, yes. But simply removing last names, first names and dates of birth isn’t sufficient to achieve anonymization, because this does not effectively remove all references to the individual. As long as the remaining information can still be connected to the affected person, the information is still considered personal data.

In order to eliminate the reference to the person, it must be ensured that it is impossible to use the totality of the available information – together with all available context information – to identify the individual person. A statement made in 1982 about “the wife of the teacher from Lauwil…” falls under the category of personal data because only one teacher taught in Lauwil (a small municipality near Basel) in 1982 and he only had one wife.

Geographical information in particular frequently carries a great deal of identifying information: the residential address alone can be used to identify a single individual in some cases. With additional information about age, number of children, size of apartment, etc., a person can be identified even if he or she lives in an apartment building. A mobile telephone movement profile without a name can also be identifying: how many people repeatedly spend both their days and nights in the same location as you, both working and living in the same place you do?

So how can the reference to the person be removed?

In general, by removing all information that makes it possible to determine who the data refers to.

Statistical data is often anonymous – but not always. Information about the average salary of city employees of the City of Basel in 2019 is not a matter of personal data. The same is true of data on the average earnings of primary school teachers. However, when it comes to the salary of the President of the Executive Council of the Canton of Basel-Stadt, then the personal reference is obvious: there was only one President of the Executive Council in 2019.

The more data is connected together, the more likely the affected individuals are to be identified. Here’s an example: we want to know how well the members of Basel’s Parlament represent the population of Basel. The individual statements – percentages by gender, age groups, residential district, professional category, taxable income and assets, dog ownership (for example, 26% of Grand Council members have a dog, but only 16% of the population does) – do not allow for any individuals to be identified, so they do not represent personal data. However, if the entire dataset about all 100 Grand Council members were published, it would certainly be possible to identify certain individual parliament members from a combination of information. This turns the seemingly anonymized data back into personal data.

How can data be anonymized?

The answer cannot be given in general terms. Different methods – or combinations of them – can be used to achieve this goal:

Omit certain details that would allow identification.
Data from multiple people is summarized (“on average…”).
Information that may allow identification is not given “precisely”, but grouped into categories: instead of listing the exact age or income, use age categories (under 20, 20–40, 40–60, over 60) or income ranges (less than CHF 40,000, CHF 40,001–60,000, CHF 60,001–80,000, more than CHF 80,000), etc.
For percentages that could lead to identifiability (0% = no one, 100% = all), the numbers are summarized in broader terms (less than 30%, 30–40%, 40–50%, … more than 70%).

It always matters how many individuals the specific information available applies to. It’s safest if – according to a well-known rule of thumb – the available data applies to a group of at least 20 individuals (for special personal data, to at least 50 individuals). This can be achieved

by choosing the reference value appropriately: for special personal data (e.g. health data) or data subject to professional confidentiality (e.g. tax data), data might be shown by district or city block (the entire rectangular area between four streets), whereas for “regular” personal data (e.g. age of resident), it might be shown by side of the block (the side of a street along a city block).
by disallowing the display of data where the group of relevant individuals is too small (e.g., if there are fewer than the required minimum number of tax returns in a particular city block).

Finally, it must be mentioned that modern technologies (such as “big data”) are making it increasingly easy for data that has previously been anonymized to become identifiable again through combinations. Such possibilities cannot be ruled out in the future; therefore, the risk of de-anonymization or re-identification must be taken into account.

License

University of Basel