For years, data protection was viewed as an annoying task that companies had to pay lip service to, but was often overlooked or underfunded. With the advent of the EU General Data Protection Regulations, all that changed. GDPR is one of the most wide-ranging pieces of EU legislation to date and carries such significant penalties that no company can afford to ignore it.
Building a privacy-preserving
data analytics stack
In this paper we have shown you how to build a modern data analytics stack that embeds data privacy at its heart. We started by giving a conceptual model that sought to tease out the core functionality and elements needed in any analytics stack.
The core elements are the Data Loader, the Data Warehouse and the Data Consumer. We saw how each of these elements requires certain functionality ranging from Ingestion and ETL to Analysis and Presentation. The idea here is to give you a better understanding of how all the parts of the analytics stack fit together.
We then looked in detail at the data security and data privacy functions. These are the only functions that are present in every element of the analytics stack. We explained what data security and privacy are and showed why they are the key functions of any analytics stack.
Later in the paper we explored how GDPR affects your data analyses. We saw that it affects anyone who handles the personal data of EU residents and explained the stringent penalties for companies that breach the regulation. We then showed how anonymization provides a simple way to bypass the requirements of GDPR. If data is properly anonymized it is no longer covered by the regulations and so can be used freely. To this end we looked at some of the concepts behind anonymization, and compared them with pseudonymisation.
Finally, we introduced you to Aircloak Insights, our turnkey solution that allows you to upgrade any existing or future analytics stack and make it fully GDPR-compliant. Aircloak Insights is the first technology that applies the Diffix anonymization approach. Diffix works by applying controlled amounts of pseudo-random noise to the query results. This has the key benefit that it avoids the problem of the query budget which affects many other dynamic privacy mechanisms. This allows the system to perform dynamic anonymization on data queries and pass the results straight to the analyst for further processing and visualisation. In conclusion, privacy regulations have often been viewed negatively by companies, but as data-driven business models emerge in more and more industries who deal with highly sensitive data, such as the healthcare and finance sector, modern technology and the right planning mean the regulations needn’t be a burden, and may even allow you to build and retain the trust of an ever-more wary public.
Building a Privacy-preserving analytics stack – better understand how to comply with the requirements imposed by GDPR while still leveraging data analysis.
Before explaining how to choose the best analytics tool stack, we first need to create an abstract model for the stack. This allows us to discuss the required functionality without being wedded to preconceived ideas about the capabilities and limitations of specific tools such as Postgres.
One of the main focuses of this paper is data privacy. However, if you are collecting personal data then you can’t achieve data privacy without data security. In this section we will explore the Data Security and Privacy function in detail.
GDPR is one of the most far-reaching data protection laws anywhere in the world, and as such it has had a huge impact globally. This is because, unlike many national data protection laws, GDPR applies to any company that deals with EU residents, wherever they are in the world. In this section we will look at the specific impact GDPR has had on analytics.
Since GDPR only relates to personal data, any data that is not personal is not covered by the regulation. This means that if you are able to completely remove any personal identifiers from the data, that data is no longer subject to the rules. This is where anonymization comes in.
In this section we give you some advice on how to select tools for your stack. As with most things, the right tool for one setting won’t be right in another setting. So, this advice covers the things you should consider when you select a particular tool.
Aircloak Insights is the first solution to offer real-time database anonymization that allows analysts to query anonymized data exactly as if it were the original raw data. In this section we will explain how Aircloak’s technology works and show why it is the first GDPR-compliant tool for database anonymization.