Natural Language Processing

In 2011 NLP made history. In the first show of the series of Jeopardy shows with IBM Watson contestants received following clue –

 

“A piece of a tree, or to puncture with something pointed”.

 

Five seconds later Watson replied with the correct answer - “stick”.

 

Watson’s top 3 answers were displayed and show that Watson gave “stick” a score of 96% while the following answers “lumber” and “bark” received only 20% and 10%. 

 

Computers doesn’t yet truly understand English in the way that humans do, but as Watson clearly showed, they can already do a lot.

 

Natural Language Processing, or NLP, is the sub-field of AI that is focused on enabling computers to understand and process human languages.

 

Using machine learning, natural language processing (NLP) helps uncover insights and relationships in unstructured data. Chatbots can use Natural Language Processing and personality analysis, and the result is that customers feel “this interface understands me”.

 

But how does NLP work. Take the following example text from which we want the computer to extract meaning –

 

“London is the capital and most populous city of England and the United Kingdom. Standing on the River Thames in the south east of the island of Great Britain, London has been a major settlement for two millennia. It was founded by the Romans, who named it Londinium”.

 

To extract the main facts from this text we need to build a pipeline – a chain of machine learning models that feed into each other.

Figure 46. How natural language is processed

Each part of the pipeline does the following:

 

·      Sentence Segmentation - Breaks the text apart into separate sentences

·      Tokanization - Processes sentences one at a time

·      Parts-of-Speech Tagging - Works out the role of each word in the sentence - noun, verb, adjective etc.

·      Lemmatization - Uses a look-up table to add the root form, or rule, of each word

·      Stop Words - Finds words that can be filtered out (eg ‘the’, ‘is’, ‘and’) before statistical analysis

·      Dependency Parsing - Works out how the words in the sentence relate to each other

·      Noun Phrases - Groups together the words that represent a single idea or thing

·      Named Entity Recognition - Labels nouns with the real-world concepts that they represent

·      Coreference Resolution - Tracks pronouns across sentences and rearranges words so they make sense

 

Fortunately, much of this pipeline is coded into libraries such as spaCy and Textacy for Python. This means that coding can be kept to a minimum. For example:

The output from this would be:

Running the same code against a longer piece of text, such as the Wiki page for London, would yield a much longer output.

 

Natural Language Processing is a top application of AI and use cases include:

·      Voice recognition systems in smartphones 

·      Voice-controlled devices, such as the Amazon Echo

·      Real-time language translation 

·      Sentiment analysis of text

 

A key application area for NLP is Law. NLP’s abilities to process language in documents, synthesise knowledge and automate reasoning have broad application in the legal services and compliance sector. With junior lawyers spending a high proportion of their time accessing and collating information, scope for augmentation and automation is considerable.

 

Key AI use cases include identifying relevant case law, processing documents for discovery and due diligence, and informing litigation strategy. In October 2018, Harvard Law School Library advanced its ‘Caselaw Access Project’ by releasing over 40 million pages of digitised legal information, including every reported state and federal legal case in the United States from the 1600s to the summer of 2017 providing extensive further data to train AI systems.

 

Use cases include sourcing and ranking relevant case law and identifying key documents in due diligence and discovery processes. With a merger and acquisition data room containing an average of 34,000 pages for review (according to legal AI firm Luminance), AI can increase business velocity and reduce costs.

Your Task.

Change the text in the following lines of code

 

text =

 

and run an analysis.

Complete and Continue