Text Summarizer in Python.
About the project: This is a Text Summarizer project in Python.
This will be a standalone script that summarizes a block of text using a simple, frequency-based algorithm.
This program will take a text input, analyze the frequency of words, and then select the most important sentences to create a summary.
How to use this program:
To use this Text Summarizer, simply save the code as a Python file and run it from your terminal:
Save the code: Save the code as a Python file (e.g., text_summarizer.py).
Run the script: Open your terminal, navigate to the directory where you saved the file, and run python text_summarizer.py.
The program will prompt you to enter your text. Once you are done, type q or quit on a new line and press Enter.
You will then be asked for the desired length of the summary in sentences.
- The program will then display the summarized text.
- This summarizer is a basic implementation and works best on well-structured text.
Project Level: Intermediate
You can directly copy the below snippet code with the help of green copy button, paste it and run it in any Python editor you have.
Steps: Follow these stepsStep 1: Copy below code using green 'copy' button.
Step 2: Paste the code on your chosen editor.
Step 3: Save the code with filename and .py extention.
Step 4: Run (Press F5 if using python IDLE)
# text_summarizer.py
import re
from collections import defaultdict
from heapq import nlargest
def summarize_text(text, num_sentences=3):
"""
Summarizes a block of text using a simple frequency-based method.
Args:
text (str): The full text to be summarized.
num_sentences (int): The number of sentences to include in the summary.
Defaults to 3.
Returns:
str: The summarized text.
"""
if not text:
return "Text to summarize cannot be empty."
# Pre-process the text: remove special characters and convert to lowercase
formatted_text = re.sub('[^a-zA-Z]', ' ', text).lower()
# Get a list of "stop words" (common words to ignore)
stop_words = set([
"i", "me", "my", "myself", "we", "our", "ours", "ourselves", "you", "your",
"yours", "yourself", "yourselves", "he", "him", "his", "himself", "she",
"her", "hers", "herself", "it", "its", "itself", "they", "them", "their",
"theirs", "themselves", "what", "which", "who", "whom", "this", "that",
"these", "those", "am", "is", "are", "was", "were", "be", "been", "being",
"have", "has", "had", "having", "do", "does", "did", "doing", "a", "an",
"the", "and", "but", "if", "or", "because", "as", "until", "while", "of",
"at", "by", "for", "with", "about", "against", "between", "into", "through",
"during", "before", "after", "above", "below", "to", "from", "up", "down",
"in", "out", "on", "off", "over", "under", "again", "further", "then",
"once", "here", "there", "when", "where", "why", "how", "all", "any",
"both", "each", "few", "more", "most", "other", "some", "such", "no",
"nor", "not", "only", "own", "same", "so", "than", "too", "very", "s",
"t", "can", "will", "just", "don", "should", "now"
])
# Tokenize the formatted text into words
words = formatted_text.split()
# Calculate word frequency, ignoring stop words
word_freq = defaultdict(int)
for word in words:
if word not in stop_words:
word_freq[word] += 1
# Calculate sentence scores based on word frequency
sentence_scores = defaultdict(int)
sentences = re.split(r'(?<!\w\.\w.)(?<![A-Z][a-z]\.)(?<=\.|\?)\s', text)
for sentence in sentences:
for word in sentence.split():
if word.lower() in word_freq:
sentence_scores[sentence] += word_freq[word.lower()]
# Get the top N sentences with the highest scores
summary_sentences = nlargest(num_sentences, sentence_scores, key=sentence_scores.get)
return " ".join(summary_sentences)
def main():
"""
Main function to run the Text Summarizer app.
"""
print("--- Python Text Summarizer ---")
print("Enter a block of text to summarize.")
print("Type 'q' or 'quit' on a new line to finish your input.")
text_input_lines = []
while True:
line = input()
if line.lower() in ['q', 'quit']:
break
text_input_lines.append(line)
full_text = " ".join(text_input_lines)
if not full_text:
print("No text was entered.")
return
try:
num_sentences = int(input("How many sentences should the summary be? (e.g., 3): ").strip())
if num_sentences <= 0:
print("Number of sentences must be greater than zero. Using default of 3.")
num_sentences = 3
except ValueError:
print("Invalid input. Using default of 3 sentences for the summary.")
num_sentences = 3
summary = summarize_text(full_text, num_sentences)
print("\n--- Summary ---")
print(summary)
print("---------------")
# This ensures that main() is called only when the script is executed directly.
if __name__ == "__main__":
main()
For more complex summarization tasks, you would typically use machine learning models and natural language processing (NLP) libraries.