Types of DATA

There are a myriad of types of data to consider, but a useful place to start is where the data for an AI system may come from and whether it’s from inside or outside of the organisation.

External Data

A key task in any organisation using AI is to balance the seemingly conflicting goals of marketing and ensuring privacy, and this is an especially important consideration when using data from outside the organisation.

Generally, the goal of using externally sourced data is to grow revenue by capturing more revenue generating customers. But where does this data come from?

Much of the data that is flowing into AI systems comes from mobile phones. As we post pictures on Facebook (350 million pictures a day), allow our location to be known to Google Maps, or browse for purchases on Amazon, we leave behind us a trail of data known as our ‘digital, or data, exhaust’.

Everything we do, both on and offline, leaves digital traces. Every purchase we make with our credit cards, every search we type into Google, every movement we make when our mobile phone is in our pocket, every "like" – all of this is stored. Multi-national corporations have made “farming data” a core business strategy – they monitor everything we do, what we buy, who we talk to, or where we take vacation.

Figure 20. Our digital exhaust feeds the Big Data and AI systems that advertise to us

Companies such as Facebook and Google make their money through data-driven advertising and through enabling companies to learn more about their target audiences - including information about geography, demographics, and purchasing behaviour.

Specialist data companies such as Acxiom, Webhose and Nielsen sell Big Data sets across a range of consumer areas from retail products to credit. In addition, there is a vast number of public pre-labelled datasets available freely.

A key trend to watch is the growth of availability of not just datasets or labelled data, but pre-trained models for purchase.

Internal Data

If an organisation has good customer data, and customers have consented to have their data used, and the organisation can show clearly how their data is being used, then AI can be used to predict purchase and upsell opportunities.

But data sourced internally, can also be used to make organisational processes more efficient and effective.

Typically, organisations use internally sourced data to:

· Improve performance

· Make better management decisions

· Increase accountability

· Manage resources more effectively

· Drive administrative efficiencies

Traditionally data driven management would be about analysing, planning and intervention. Data would typically be ‘mined’, and have analytics and reporting services applied to it. Here, every detail in the process is structured and engineered.

However, AI can work in a much less structured way. It can work on unlabelled data, using unsupervised machine learning methods to find patterns, and draw its own conclusions.

Data Lake

A data lake is a single store of all enterprise data including raw copies of source system data and processed data. Data lakes serve data for reporting, BI, dashboards analytics and machine learning.

Machine Learning Data

In machine learning, data is usually split into 3 sets –

Training Set (60% of the original data set): This is used to build up our prediction algorithm. In this phase we usually create multiple algorithms in order to compare their performances.

Validation Set (20% of the original data set): This data set is used to compare the performances of the prediction algorithms that were created based on the training set. We choose the prediction algorithm that has the best performance.

Test Set (20% of the original data set): We apply our chosen prediction algorithm on our test set in order to see how it's going to perform, so we can have an idea about our algorithm's performance on previously unseen data.

Complete and Continue