Natural Language Processing with Python: Analyze and Understand Text Using NLP

← Back to Home

Part 6: Natural Language Processing (NLP) with Python



What Is Natural Language Processing (NLP)?

NLP is a branch of AI that enables computers to understand, interpret, and generate human language. NLP powers:

  • Search engines like Google
  • Translation apps like Google Translate
  • Chatbots and voice assistants
  • Text summarization and sentiment analysis

Learning NLP is crucial for building real-world applications in AI, chatbots, customer service automation, and content analysis.



Tools We'll Use

  • NLTK - Comprehensive NLP toolkit for research and teaching
  • TextBlob - Simplified text analysis and sentiment detection
  • spaCy - Industrial-strength NLP library for speed and scalability
  • Pandas & Scikit-learn - For dataset handling and modeling

Install required libraries:


pip install nltk textblob spacy pandas scikit-learn
python -m textblob.download_corpora
python -m nltk.downloader punkt wordnet
python -m spacy download en_core_web_sm


Step 1: Basic Text Analysis with TextBlob


from textblob import TextBlob

text = "Python is a powerful language for machine learning."
blob = TextBlob(text)

print("Words:", blob.words)
print("Sentences:", blob.sentences)


Step 2: Sentiment Analysis


text = "I love working with Python, but debugging can be frustrating."
blob = TextBlob(text)
print(blob.sentiment)

Output Explanation:

  • Polarity: Ranges from -1 (negative) to +1 (positive)
  • Subjectivity: Ranges from 0 (objective) to 1 (subjective)


Step 3: Tokenization and Lemmatization with NLTK


import nltk
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer

nltk.download('punkt')
nltk.download('wordnet')

text = "Cats are running faster than the dogs."
tokens = word_tokenize(text)
lemmatizer = WordNetLemmatizer()

lemmas = [lemmatizer.lemmatize(token.lower()) for token in tokens]
print("Lemmatized Tokens:", lemmas)


Step 4: Named Entity Recognition with spaCy


import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("Apple is looking at buying a startup in the UK for $1 billion.")

for entity in doc.ents:
    print(entity.text, "-", entity.label_)


Step 5: Mini Project - Sentiment Classifier


def get_sentiment(text):
    blob = TextBlob(text)
    polarity = blob.sentiment.polarity
    if polarity > 0:
        return "Positive"
    elif polarity < 0:
        return "Negative"
    else:
        return "Neutral"

print(get_sentiment("I love AI and machine learning!"))  # Positive
print(get_sentiment("I hate bugs in my code."))          # Negative


Step 6: Advanced NLP with Real-World Dataset

Let's use the IMDB movie review dataset for sentiment analysis:


import pandas as pd
from sklearn.model_selection import train_test_split
from textblob import TextBlob

# Load dataset
df = pd.read_csv("https://raw.githubusercontent.com/dD2405/Twitter_Sentiment_Analysis/master/train.csv")
df = df[['text','label']]  # 'text' and 'label' columns

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(df['text'], df['label'], test_size=0.2, random_state=42)

Now, apply the TextBlob-based classifier:


def classify_sentiment(text):
    polarity = TextBlob(text).sentiment.polarity
    return 1 if polarity > 0 else 0

y_pred = X_test.apply(classify_sentiment)

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))


Step 7: Practice Challenges

  • Ask the user for input and return sentiment dynamically
  • Build a chatbot that responds based on detected sentiment
  • Extract named entities from paragraphs using spaCy
  • Try using TF-IDF features and train a logistic regression classifier
  • Experiment with more advanced models like LSTM or BERT for text classification


Step 8: NLP Cheat Sheet


# Text Preprocessing
tokenize(text)        # Split text into words or sentences
lemmatize(word)       # Convert to base form
remove_stopwords(text) # Filter out common words

# Libraries
TextBlob - Sentiment analysis & text statistics
NLTK - Tokenization, stemming, lemmatization, corpora
spaCy - Industrial-strength NLP, Named Entity Recognition

# Sentiment Analysis
polarity = TextBlob(text).sentiment.polarity  # -1 to 1
subjectivity = TextBlob(text).sentiment.subjectivity  # 0 to 1

# Evaluation
accuracy_score(y_true, y_pred)
confusion_matrix(y_true, y_pred)
classification_report(y_true, y_pred)

# Advanced
TF-IDF features
Bag-of-Words model
Word embeddings (Word2Vec, GloVe)
RNN / LSTM / BERT for deep NLP


Step 9: FAQs

1. Can I use NLP without Python?

Yes, NLP frameworks exist for Java, R, and other languages, but Python offers the most extensive ecosystem.

2. How do I improve sentiment accuracy?

Use TF-IDF, word embeddings, or deep learning models like LSTM/BERT for context-aware classification.

3. Can I process non-English languages?

Yes, TextBlob and spaCy support multiple languages; make sure to download appropriate language models.

4. How do I handle large datasets?

Use batch processing, vectorization, and GPU-accelerated frameworks.

5. Which NLP library is best for production?

spaCy is industrial-grade, fast, and widely used in production NLP pipelines.



🎓 What You’ve Learned

  • Understanding NLP and its applications
  • Text preprocessing: tokenization, lemmatization
  • Sentiment analysis using TextBlob and real datasets
  • Named Entity Recognition with spaCy
  • Building mini NLP projects with real-world datasets
  • Evaluating models with accuracy, confusion matrix, and classification report


📢 Call to Action

If this tutorial helped, share it with your friends, comment your experiments, and subscribe for the next part on Computer Vision with Python. Try the practice challenges and post your results in the comments.



🧭 What’s Next?

In Part 7, we’ll tackle Computer Vision using OpenCV and Deep Learning. You’ll learn how to analyze, process, and classify images using Convolutional Neural Networks (CNNs).