Introduction Grammar Check Using NLTK
In this tutorial, We are going to learn how to develop a python code to English grammar check using NLTK in a given sentence. We will be using python NLTK that has built for natural language-related developments.
Any sentence which is included in context-free language should be able to derive using a derivation tree either using the bottom-up approach or the top-down approach. Now let’s see how we can develop the above-described program.
First, we need to write a proper context-free language for ‘English Grammar’ which doesn’t have any ambiguity.
Here is the context-free grammar which created for ‘English Grammar’. But note that it doesn’t cover all the grammar rules in English. But it gives the basic idea of how to create such a context-free grammar for a natural language.
Tutorial: Let’s create the file in .cfg format and save it. S -> NP_Sg VP_Sg | NP_Pl VP_Pl NP -> NP_Pl | NP_SgNP_Sg -> N_Sg | Det_Sg N_Sg | Det_Both N_Sg | Adj N_Sg | Det_Sg Adj N_Sg | Det_Both Adj N_Sg| PropN_Sg NP_Pl -> N_Pl | Det_Pl N_Pl | Det_Both N_Pl | Adj N_Pl | Det_Pl Adj N_Pl | Det_Both Adj N_Pl| PropN_Pl VP_Sg -> IV_Pres_Sg | IV_Past | TV_Pres_Sg | TV_Past | TV_Pres_Sg NP | TV_Past NP | Adv IV_Pres_Sg | Adv IV_Past | Adv TV_Pres_Sg NP | Adv TV_Past NP VP_Pl -> IV_Pres_Pl | IV_Past | TV_Pres_Pl | TV_Past | TV_Pres_Pl NP | TV_Past NP | Adv IV_Pres_Pl | Adv IV_Past | Adv TV_Pres_Pl NP | Adv TV_Past NPN_Pl -> 'girls' | 'boys' | 'children' | 'cars' | 'apples' | 'dogs' Adj -> 'good' | 'bad' | 'beautiful' | 'innocent' Adv -> 'happily' | 'sadly' | 'nicely' N_Sg -> 'dog' | 'girl' | 'car' | 'child' | 'apple' | 'elephant' PropN_Sg -> 'rashmi' | 'piyumika' PropN_Pl -> 'they' | 'i' Det_Sg -> 'this' | 'every' | 'a' | 'an' Det_Pl -> 'these' | 'all' Det_Both -> 'some' | 'the' | ' several' IV_Pres_Sg -> 'dissappeares' | 'walks' TV_Pres_Sg -> 'sees' | 'likes' |'eat' IV_Pres_Pl -> 'dissappear' | 'walk' TV_Pres_Pl ->'see' | 'like' IV_Past -> 'dissappeared' | 'walked' TV_Past -> 'saw' | 'liked' | 'ate' | 'shot'
Next,write the code in order to make a sentence go through our context-free grammar and give output as ‘Correct Grammar’ and parse tree if the sentence is grammatically correct. If the sentence is grammatically incorrect it should give the output as ‘Incorrect Grammar’.
Create a .py or .ipynb file.
Import required library import nltk import codecs Read the configure file load_grammar = nltk.data.load('file:english_grammer.cfg') file_input = codecs.open('english_input.txt', 'r') for sent in file_input: wrong_syntax=1 sent_split = sent.split() print("\n\n"+ sent) rd_parser = nltk.RecursiveDescentParser(load_grammar) for tree_struc in rd_parser.parse(sent_split): s = tree_struc wrong_syntax=0 print("Correct Grammer !!!") print(str(s)) f = open("demoEnglish.txt", "a") f.write("Correct Grammer!!!") f.write(str(s)) f.close() if wrong_syntax==1: print("Wrong Grammer!!!!") f = open("demoEnglish.txt", "a") f.write("Wrong Grammer!!!") f.close()
In the above code, I have imported nltk library.We need nltk to parse our grammar and check whether the input sentence can be created successfully using that grammar or not.Codecs is a module that includes encoders and decoders for converting text between different representations. In the bold text in the above code, Recursive Descent Parser is a kind of top-down parser.
Next, we need to create an input file with English sentences. Include both correct and incorrect sentences and then you can identify how code works for both situations.
Here,example and parse trees for the output.
Sentence: a good child happily walked
Output: correct grammar!!!
Sentence: I walk
Output: wrong grammar.
In this article, you are learning the sentence grammar is right or wrong. Used the NLTK library for classifying the text and mentioned it is wrong or right. Also used an advanced method called Recursive Descent Parser.