Learning from DATA

AI cannot exist without data, and the exponential rise in the amount of data available has been a key driver in the widespread adoption of AI.  

By 2020 our accumulated digital data will continue to grow exponentially to around 44k exabytes (44 ettabytes), or 44 trillion gigabytes. Just a fraction of this is analysed.

Figure 17. Data - growing exponentially[i]

[i] https://www.emc.com/collateral/analyst-reports/idc-the-digital-universe-in-2020.pdf

The starting point to any AI project is data. A solid data quality foundation is critical for a successful AI project. Data has to be consistent, accurate, and complete. A data quality strategy and plan is a critical part of a successful implementation.[i]

[i] https://towardsdatascience.com/data-quality-in-the-era-of-a-i-d8e398a91bef

Data Protection

In many cases the most useful data to apply machine learning to is personal data. Being able to identify the names and contact details of those people most likely to buy something is clearly a big benefit to marketing departments.

In Western countries, and particularly in Europe, strict data protection rules means that people’s privacy must be protected, necessitating a ‘privacy first’ design principle for any system that uses personal data.

Of course, if data does not contain personal details – e.g. anonymised data - it is considered outside data protection rules.  

In the European Union, General Data Protection Regulations (GDPR) - obliges organisations to respect the privacy rights of the citizen, so it’s important to ‘bake’ GDPR requirements into an AI solution, along with security principles, at the start of the solution design process. In other words, AI systems should be private and secure by design. AI solution design should start with an Impact Assessment.  

In the UK the Information Commissions Office (ICO) is the regulator and responsible for the enforcement of GDPR[i]. The ICO’s job is to ensure that personal data is processed fairly, lawfully, and transparently and that underlying algorithms are transparent.  

Whilst the algorithms used in AI solutions can be very complex, and the workings of Neural Networks can be difficult to explain, its incumbent on the AI solution operator to ensure that they are able to explain how their solution produces their outcomes. If necessary, the regulator will look inside an AI solution to determine whether it is compliant or not.

[i] The Information Commissioners Office - https://ico.org.uk/for-organisations/guide-to-the-general-data-protection-regulation-gdpr/individual-rights/rights-related-to-automated-decision-making-including-profiling/

Be Prepared to Show Your Workings

All algorithms are increasingly being used to make decisions that affect us.

·      In France a centralised algorithm matches teachers to jobs across the country. 

·      Algorithmic decisions increasingly help decide which candidates to hire.

·      Lenders use algorithms to decide which applicants to accept and what interest rates to set.

·      A number of states in the USA have used algorithms to make Criminal Justice decisions on sentencing, bail and recidivism. 

·      Decisions about medical treatments are increasingly being informed by algorithms.

But people don’t trust AI[i]. Nearly 3⁄4 of consumers want to know more about how their data is used (CBI). Whilst GDPR requires that organisations must provide meaningful information about decision-making logic, is GDPR alone enough to win back trust?

In order to help people gain trust in AI & algorithm-based decisions it should be possible to check:

1. The basis for the algorithm

2. Its past performance

3. The reasoning behind its current claim

4. Any uncertainty behind its current claim

5. Explanations should be open to different levels of expertise

A good case study here is the NHS’s ‘Predict’ is a system for women choosing treatment for breast cancer. The workings behind Predict – including the maths behind the algorithms - are made clear on 4 different levels, and the full implementation is available on GitHub.  

Predict is an example of Explainable AI (XAI) which aims to enable human users to understand, appropriately trust, and effectively manage artificially intelligence.

New XAI methods aim to produce more explainable models, while maintaining a high level of learning performance.

The challenge ahead is to make AI and algorithmic decision-making simple enough for non-experts to understand. 

As an example of how the principles of showing your workings is being applied, The City of New York City now mandates that any algorithm used in making decisions that affect citizens must be made public[ii]

Acknowledgements - Alan Turing Institute Lecture: “Be prepared to show your workings!”

[i] https://www.scientificamerican.com/article/people-dont-trust-ai-heres-how-we-can-change-that

[ii] https://www.nytimes.com/2017/08/24/nyregion/showing-the-algorithms-behind-new-york-city-services.html

Complete and Continue