Introduction Named Entity Recognition in NLP
The Named Entity Recognition in NLP, or NER, is to detect and label these nouns with the real-world concepts that they represent. But NER systems aren’t just doing a simple dictionary lookup. Instead, they are using the context of how a word appears in the sentence and a statistical model to guess which type of noun a word represents. Named entities are definite noun phrases that refer to specific types of individuals, such as organizations, persons, date and time, product names, amount of money, and so on.
Named Entity Recognition in NLP has tons of uses since it makes it so easy to grab structured data out of text. It’s one of the easiest ways to quickly get value out of an NLP pipeline.
|ORGANIZATION||Georgia-Pacific Corp., WHO|
|TIME||5.00 AM IST|
|DATE||October 2, 2020|
|FACILITY||Washington Monument, Stonehenge|
|GPE||southeast Asia, Midlothian|
Name Entity Recognition
The goal of a named entity recognition in NLP (NER) system is to identify all textual mentions of the named entities. This can be broken down into two sub-tasks: identifying the boundaries of the NE, and identifying its type. While named entity recognition is frequently a prelude to identifying relations in Information Extraction, it can also contribute to other tasks. For example, in Question Answering (QA), we try to improve the precision of Information Retrieval by recovering not whole pages, but just those parts which contain an answer to the user’s question. Most QA systems take the documents returned by standard Information Retrieval, and then attempt to isolate the minimal text snippet in the document containing the answer. Now suppose the question was Who was the first President of India? and one of the documents that were retrieved contained the following passage:
The prime minister of India (IAST: Bharat Ke Pradhanamantri) is the leader of the executive of the Government of India. The prime minister is the chief adviser to the president of India and the head of the Union Council of Ministers.It was built in honor of India, who led the country to independence and then became its first President.
Analysis of the question leads us to expect that an answer should be of form X was the first President of the US, where X is not only a noun phrase but also refers to a named entity of type PERSON. This should allow us to ignore the first sentence in the passage. While it contains two occurrences of India, named entity recognition should tell us that neither of them has the correct type.
Another major source of difficulty is caused by the fact that many named entity terms are ambiguous. Thus May and North are likely to be parts of named entities for DATE and LOCATION, respectively, but could both be part of a PERSON; conversely, Christian Dior looks like a PERSON but is more likely to be of type ORGANIZATION. A term like Yankee will be an ordinary modifier in some contexts but will be marked as an entity of type ORGANIZATION in the phrase Yankee infielders.
Further challenges are posed by multi-word names like Stanford University, and by names that contain other names such as Cecil H. Green Library and Escondido Village Conference Service Center. In named entity recognition, therefore, we need to be able to identify the beginning and end of multi-token sequences.
Named entity recognition is a task that is well-suited to the type of classifier-based approach that we saw for noun phrase chunking.
NLTK provides a classifier that has already been trained to recognize named entities, accessed with the function nltk.ne_chunk(). If we set the parameter binary=True, then named
entities are just tagged as NE; otherwise, the classifier adds category labels such as PERSON, ORGANIZATION, and GPE. sent = nltk.corpus.treebank.tagged_sents() print(nltk.ne_chunk(sent, binary=True)) output: (S The/DT (NE U.S./NNP) is/VBZ one/CD ... according/VBG to/TO (NE Brooke/NNP T./NNP Mossman/NNP) ...)
In this article, we are learning about the importance of words based on entity recognition. Most of the time same word having as same meaning so they reduce with the help of name entity recognition.