Data Science at Home
EU Regulations and the rise of Data Hijackers: A New Podcast Episode
From a recent publication titled “EU regulations on algorithmic decision-making and a right to explanation” about EU regulations to be applied to machine learning algorithms and data science in the near future, here is what we think at Data Science at Home.
Enjoy the show!
Podcast transcript
The subject of this episode is about a recent regulation that is being considered by EU that needs the attention of experts in data analytics and machine learning.
Data scientists are cool! And data science as machine learning is just an awesome field that many times, especially in the last few years is achieving goals that are more magical than real.
The improvements of data science in several domains are just impressive.
We’re not far from the first reliable self driving vehicle being it a car, that can deal with realistic urban scenarios. Then there will be self driving cabs, buses, trains and who knows maybe planes and boats.
We can already predict quite accurately if we might be interested in purchasing this or that book, connect to this or that person to engage amazing conversations because they share our personal interests.
We can already support decisions of medical doctors in predicting certain diagnoses (and in some cases algorithms can be scarily accurate, think of fMRI or cancer imaging). I was personally involved in the Non Invasive Prenatal Testing – to detect chromosomal aberrations of the fetus from DNA data analysis. For specific genetic disorders and under specific conditions the accuracy can be higher than 97%. This is what I mean by scarily accurate!
Maybe the chances that you are protecting the interests of a financial institution are not that high, but would you like to know in advance if families can pay back their loan, in order to minimize the risks for both?
All of this can be illegal, in just less than 2 years.
According to a EU regulation that will take place in April 2018, it has been decided that algorithms that make decisions based on user-level predictors which “significantly affect” users, should be regulated. We will go to the details of what significantly affect means.
In addition, they also introduce the “right to explanation” for users who can ask for an explanation of an algorithmic decision that was made about them.
The first part of the regulation literally declares as out of law, many important algorithms that are currently used, eg. recommendation systems, algorithms for credit and insurance risk assessments, computational advertising, and social networks.
The second part is somehow impossible to guarantee. Namely, explaining a decision is usually equivalent to fully understanding an algorithm. Which can be possible only in few cases.
What about deep learning and neural networks? How can we explain what is notoriously known to be a black box? How about probabilistic approaches that make decisions that are only probabilistically stable/valid? Think about methods that use sampling, or MCMC methods in which there is an important random component.
Maybe the only algorithms that can be easily explained are linear regression and decision trees, to name a few, that are heavily based on the concept of correlation and trend analysis. This still does not explain causality however. We will get back to this concept later.
Here is an excerpt of the General Data Protection Regulation (GDPR)
I read
Article 11. Automated individual decision making
* Member States shall provide for a decision based solely on automated processing, including profiling, which produces an adverse legal effect concerning the data subject or significantly affects him or her, to be prohibited unless authorised by Union or member State law to which the controller is subject and which provides appropriate safeguards for the...