TUTORIAL

Natural Language Processing and sentiment analysis with TextBlob: a Python NLP library

What is NLP(Natural Language Processing)?

Natural language processing (NLP) is a subfield of computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural language data. (Wikepedia)

Dependency to install:

$ pip install textblob

Some features of TextBlob:

Noun Phrase Extraction
Sentiment Analysis
Tokenization
Words Lemmatization
Spell Check
Translation

# NLP_and_Sentiment_Analysis_With_TextBlob

  from textblob import TextBlob
  import nltk

Install “brown” and “wordnet” using NLTK:

  nltk.download("wordnet")
  nltk.download("brown")

  [nltk_data] Downloading package wordnet to /Users/frankdu/nltk_data...
  [nltk_data]   Package wordnet is already up-to-date!
  [nltk_data] Downloading package brown to /Users/frankdu/nltk_data...
  [nltk_data]   Package brown is already up-to-date!





  True

### sample text data: zen of Python

  text1 = '''
  Beautiful is better than ugly.
  Explicit is better than implicit.
  Simple is better than complex.
  Complex is better than complicated.
  Flat is better than nested.
  Sparse is better than dense.
  Readability counts.
  Errors should never pass silently.'''

## 1. Create a TextBlob object

  blob = TextBlob(text1)

## 2. noun_phrases

  blob.noun_phrases

  WordList(['beautiful', 'explicit', 'simple', 'complex', 'flat', 'sparse', 'readability', 'errors'])

## 3. Sentiment Analysis

Access the polarity and subjectivity of a TextBlob object via returned named tuple.

  polarity = blob.sentiment[0]

  subjectivity = blob.sentiment[1]

  print(polarity, subjectivity)

  0.14464285714285716 0.5272959183673469

Or the following will also work:

  polarity = blob.polarity
  subjectivity = blob.subjectivity

  print(polarity, subjectivity)

  0.14464285714285716 0.5272959183673469

## 4. Tokenization

Tokenization is a NLP technique to split text data into list of single words or sentences.

  blob.words

  WordList(['Beautiful', 'is', 'better', 'than', 'ugly', 'Explicit', 'is', 'better', 'than', 'implicit'])

  blob.sentences

  [Sentence("Beautiful is better than ugly."),
   Sentence("Explicit is better than implicit.")]

## 5. Words Lemmatization

lemmatization is a NLP technique to reduce inflectional or derivational forms of a word to be the base form so that they can be analyzed as one single word

Examples of lemmatization:

** went > go gone > go organizes > organize organized > organize **


  # import textblob Word object
  from textblob import Word


  word = "organizes"

  word1 = "went"


  # Convert a Python string to a textblob Word()
  blob_word = Word(word)

  print(blob_word.lemmatize("v"))
  print(Word(word1).lemmatize("v"))

  organize
  go

## 6. Spell Check

  blob = TextBlob("My spellin is aways corect")

Unfortunately only 50% accuracy. You will need a ML model to train for a better accuracy

  blob.correct()

  TextBlob("By spelling is away correct")

## 7. Translation

  blob = TextBlob("Beautiful is better than ugly. Explicit is better than implicit.")


  # Translate to German from English
  blob.translate(to="de")

  TextBlob("Schön ist besser als hässlich. Explicit ist besser als implizit.")