Introduction To Text Summarization Using NLP
Because every one going to words shortcut-way except research scientist, so required a summary of every product. For, example if a patient is admitted to hospital and the health insurance company wants a summary about a patient for the claim and that process is time-consuming so we are going to do Text Summarization Using NLP which helps us to summarize.
With our busy schedule, we prefer to read the summary of those articles before we decide to jump in for reading the entire article. Reading a summary helps us to identify the interest area, gives a brief context of the story.
Image source – google
Definition: A large portion of this text or data is explained in important information. For example, the abstract of a research paper.
How Text Summarization Using NLP Works
Before lets shortly discussed the types of summarization. Broadly classified into two groups-
- Extractive Summarization
- Abstractive Summarization
These methods rely on extracting several parts, such as phrases and sentences, from a piece of text and stack them together to create a summary. Therefore, identifying the right sentences for summarization is of utmost importance in an extractive method. This method based on the weight of the most important words in that given sentence. Various algorithms are used to define the specific weights for the word and further rank them based on important.
These methods use advanced NLP techniques to generate an entirely new summary. This method based on semantic means selects the specific words using the semantic method. Some time words did not appear in documents. The main aim is to produce important material in a new.
Now, used the Abstractive summarization method, and it required a cosine similarity to find out the distance between two similar words. Cosine similarity is a measure of similarity between two non-zero vectors. Since we will be representing our sentences as a bunch of vectors, we can use it to find the similarity among sentences. It measures the cosine of the angle between vectors. The angle will be 0 if sentences are similar.
Import Required Library import networkx as nx import pandas as pd import numpy as np from nltk.corpus import stopwords from nltk.cluster.util import cosine_distance Clen raw text def read_article(file_name): file = open(file_name, 'r') file_data = file.readlines() article = file_data.split('.') sentences = [ ] for sentence in article: print(sentence) sentences.append(sentence.replace("[^a-zA-Z]", " ").split(" ")) sentences.pop() return sentences
def sentence_similarity(sent1, sent2, stopwords=None): if stopwords is None: stopwords = [ ] sent1 = [w.lower() for w in sent1] sent2 = [w.lower() for w in sent2] all_words = list(set(sent1 + sent2)) vector1 =  * len(all_words) vector2 =  * len(all_words) # build the vector for the first sentence for w in sent1: if w in stopwords: continue vector1[all_words.index(w)] += 1 # build the vector for the second sentence for w in sent2: if w in stopwords: continue vector2[all_words.index(w)] += 1 return 1 - cosine_distance(vector1, vector2)
def cosine_similarity_matrix(sentences, stop_words): #create a empty similarity matrix similarity_matrix = np.zeros((len(sentences), len(sentences))) for idx1 in range(len(sentences)): for idx2 in range(len(sentences)): if idx1 == idx2: Continue similarity_matrix[idx1][idx2]=sentence_similarity(sentences[idx1], sentences[idx2], stop_words) return similarity_matrix
Now produce the Summary
Now with the help of function create a summarization pipeline. def generate_summary(file_name, top_n=5): stop_words = stopwords.words('english') summarize_text =  #step 1 for token sentences = read_article(file_name) #step 2 sentence_similarity_martix = cosine_similarity_matrix(sentences, stop_words) # Step 3 - Rank sentences in similarity martix sentence_similarity_graph = nx.from_numpy_array(sentence_similarity_martix) scores = nx.pagerank(sentence_similarity_graph) # Step 4 - Sort the rank and pick top sentences ranked_sentence = sorted(((scores[i],s) for i,s in enumerate(sentences)), reverse=True) print("Indexes of top ranked_sentence order are ", ranked_sentence) for i in range(top_n): summarize_text.append(" ".join(ranked_sentence[i])) # Step 5 - Offcourse, output the summarize texr print("Summarize Text: \n", ". ".join(summarize_text)) # let's begin generate_summary(“sample.txt", 2)
When you check sample text and summarize text you observe that how to create a summary. Check the below output.
This program also included developer-focused AI school that provided a bunch of assets to help build AI skills. Envisioned as a three-year collaborative program, Intelligent Cloud Hub will support around 100 institutions with AI infrastructure, course content and curriculum, developer support, development tools and give students access to cloud and AI services
It is important to understand that we have used text rank as an approach to rank the sentences. TextRank does not rely on any previous training data and can work with any arbitrary piece of text. TextRank is a general-purpose graph-based ranking algorithm for NLP.
There are much-advanced techniques available for text summarization.
We can learn the advanced technique for text summarization using rank and cosine methods.