Text Summarization Using NLP


Introduction To Text Summarization Using NLP

Because every one going to words shortcut-way except research scientist, so required a summary of every product. For, example if a patient is admitted to hospital and the health insurance company wants a summary about a patient for the claim and that process is time-consuming so we are going to do Text Summarization Using NLP which helps us to summarize.

With our busy schedule, we prefer to read the summary of those articles before we decide to jump in for reading the entire article. Reading a summary helps us to identify the interest area, gives a brief context of the story.

Image source – google

Definition: A large portion of this text or data is explained in important information. For example, the abstract of a research paper.

How Text Summarization Using NLP Works

Before lets shortly discussed the types of summarization. Broadly classified into two groups- 

  1. Extractive Summarization
  2. Abstractive Summarization

Extractive Summarization

These methods rely on extracting several parts, such as phrases and sentences, from a piece of text and stack them together to create a summary. Therefore, identifying the right sentences for summarization is of utmost importance in an extractive method. This method based on the weight of the most important words in that given sentence. Various algorithms are used to define the specific weights for the word and further rank them based on important.

Abstractive Summarization

These methods use advanced NLP techniques to generate an entirely new summary. This method based on semantic means selects the specific words using the semantic method. Some time words did not appear in documents. The main aim is to produce important material in a new.

Now, used the Abstractive summarization method, and it required a cosine similarity to find out the distance between two similar words. Cosine similarity is a measure of similarity between two non-zero vectors. Since we will be representing our sentences as a bunch of vectors, we can use it to find the similarity among sentences. It measures the cosine of the angle between vectors. The angle will be 0 if sentences are similar.

Import Required Library
import networkx as nx
import pandas as pd
import numpy as np
from nltk.corpus import stopwords
from nltk.cluster.util import cosine_distance
Clen raw text
def read_article(file_name):
	file = open(file_name, 'r')
	file_data = file.readlines()
	article = file_data[0].split('.')
	sentences = [ ]
	for sentence in article:
    	sentences.append(sentence.replace("[^a-zA-Z]", " ").split(" "))
	return sentences

Sentence Similarity

def sentence_similarity(sent1, sent2, stopwords=None):
	if stopwords is None:
    	stopwords = [ ]
	sent1 = [w.lower() for w in sent1]
	sent2 = [w.lower() for w in sent2]
	all_words = list(set(sent1 + sent2))
	vector1 = [0] * len(all_words)
	vector2 = [0] * len(all_words)
	# build the vector for the first sentence
	for w in sent1:
    		if w in stopwords:
    			vector1[all_words.index(w)] += 1
	# build the vector for the second sentence
	for w in sent2:
    		if w in stopwords:
    			vector2[all_words.index(w)] += 1
		return 1 - cosine_distance(vector1, vector2)

Cosine similarity

def cosine_similarity_matrix(sentences, stop_words):
	#create a empty similarity matrix
	similarity_matrix = np.zeros((len(sentences), len(sentences)))
	for idx1 in range(len(sentences)):
    		for idx2 in range(len(sentences)):
       		 	if idx1 == idx2:
        		similarity_matrix[idx1][idx2]=sentence_similarity(sentences[idx1], sentences[idx2], stop_words)
	return similarity_matrix

Now produce the Summary

Now with the help of function create a summarization pipeline.

def generate_summary(file_name, top_n=5):
	stop_words = stopwords.words('english')
	summarize_text = []
	#step 1 for token
	sentences = read_article(file_name)
	#step 2
	sentence_similarity_martix = cosine_similarity_matrix(sentences, stop_words)

	# Step 3 - Rank sentences in similarity martix
	sentence_similarity_graph = nx.from_numpy_array(sentence_similarity_martix)
	scores = nx.pagerank(sentence_similarity_graph)

	# Step 4 - Sort the rank and pick top sentences
	ranked_sentence = sorted(((scores[i],s) for i,s in enumerate(sentences)), reverse=True)    
	print("Indexes of top ranked_sentence order are ", ranked_sentence)    

	for i in range(top_n):
  	summarize_text.append(" ".join(ranked_sentence[i][1]))

	# Step 5 - Offcourse, output the summarize texr
	print("Summarize Text: \n", ". ".join(summarize_text))

# let's begin
generate_summary(“sample.txt", 2)


When you check sample text and summarize text you observe that how to create a summary. Check the below output.

Summarize Text: 

  This program also included developer-focused AI school that provided a bunch of assets to help build AI skills.  Envisioned as a three-year collaborative program, Intelligent Cloud Hub will support around 100 institutions with AI infrastructure, course content and curriculum, developer support, development tools and give students access to cloud and AI services

It is important to understand that we have used text rank as an approach to rank the sentences. TextRank does not rely on any previous training data and can work with any arbitrary piece of text. TextRank is a general-purpose graph-based ranking algorithm for NLP.

There are much-advanced techniques available for text summarization.


We can learn the advanced technique for text summarization using rank and cosine methods.


Please enter your comment!
Please enter your name here