Mixed datasets, combining personal and non-personal data, are becoming increasingly common and popular these days, in science but also in business. This raises the question of how to deal with those datasets from a security and privacy perspective. Here is my take on the subject.
Non-personal data are defined a contrario as data that do not belong to the category of personal data, which surely is a very broad definition. An example of such non-personal data can be found in the weather forecasts. There is no risk at all that anyone could trace back any natural person from the prediction that “tomorrow is going to be sunny in Brussels”.
Other examples could be trickier, though. You would assume that the performance data of industrial machinery would classify as non-personal too. But if we were hypothetically able to know the worker’s identity engaged with that particular tool at a certain time, lo and behold, the data would be deemed as personal instead, since they would entail the labour of that natural person.
What’s in a name?
There are various instances of mixed databases. Take for example a company’s tax record that contains figures relating to a legal person such as revenues, profits, taxable amounts, due amounts (non-personal data) mixed with the name and contact details of the managing director of the company (personal data). This is an example of a database mixing a majority of non-personal data with a small touch of personal data.
There could be other mixed datasets in which the two components are present in a more balanced proportion. Just think of a research institute that stores aggregated statistical data together with the raw personal data of individual respondents to its statistical survey questions. Or a customer service management dataset reporting on a company’s interventions by specifically detailing the nature of the issue, the timing of the intervention, the solution found and the satisfaction of the customer, together with his or her identity.
The legal perspective
We know that personal data are subject to strong guarantees, while non-personal data are exempt from all these guarantees and obligations. What’s more, the legislator even stresses the necessity for non-personal data to be more available for the European industry across the different borders, in order to enhance competition and the digitalisation of every sector.
What about mixed datasets, then? They seem stretched between these two tendencies: on the one hand there is the added value of sharing non-personal data (see the Free Flow of Non-Personal Data Regulation) and on the other hand there is the urge to protect the persons behind personal data (e.g. GDPR). According to the EU Commission, mixed datasets are subject to the personal data rules in so far as personal data are involved, even if only in a minimal amount. So the need to guarantee the protection of personal data seems to prevail here. The Commission also notes that personal data and non-personal data are often inextricably linked. “Inextricability” as a concept is not very clear, however. It translates more concretely as “impossible to separate”, “economically inefficient to separate”(according to the data controller) or “technically inseparable”.
From theory to practice
Coming back now to the two examples described above, we would have the following scenarios. In the first case, the person who wants to process the company’s tax records would have to comply with all the GDPR obligations because the mixed dataset includes personal data, even if only a small amount. Indeed, it is a legal requirement that the tax record is associated with the identifiers of the manager in charge, as the responsible signee. In other words, there is a legal requirement that impedes the separation of personal from non-personal data. So the level of inextricability is high and insurmountable, even if it would be technically easy to sever the personal data from the rest.
Conversely, in the case of the research institute, the original mixed dataset could easily be turned into a non-personal dataset by simply erasing the individual answers and continuing to research on the aggregated anonymised data instead. In doing so, however, we should not ignore the fact that erasing itself is considered personal data processing. So the mixed dataset will become a non-personal dataset only after the actual erasure has happened. In other words, the obligations and guarantees of the GDPR will become irrelevant only after the erasing process.
Concluding advice
Mixing personal and non-personal data is very much a common practice these days, as it helps to release all the potentials of the European data economy. The law establishes, however, that a higher level of protection should be applied to such mixed datasets.
In order to work in a more agile and less cumbersome way in terms of compliance, data controllers should leave out the personal component of mixed datasets wherever and whenever possible. They should aim at exclusively processing non-personal data, which offer more room for maneuver. Unfortunately, as seen in one of the examples above, that opportunity does not always exist. In actual practice, it depends on the level of inextricability between the personal and non-personal data.