Recognizing Parts of Speech in given Sentence using Apache OpenNLP

July 28, 2020

Recognizing Parts of Speech in given Sentence using Apache OpenNLP

In this post we will discuss on recognizing parts of speech in a given sentence using Apache OpenNLP library. This is part of the series on learning Natural Language Processing using Apache OpenNLP library.

Like other models where we detect sentence, or find names, or tokenize any give string, in this post also we will be using one of the trained models which is provided by Apache Open NLP library only. You can find the other post discussing on detecting sentence, using named entity recoginition to find names or tokenize a given string here in these posts. Finding Sentences, Finding Names using NamedEntityRecognition and Tokenization of the Strings.

We will be using en-pos-maxent.bin models which detects the parts of speech and tag it based on the short names. In the below table, you can find the parts of speech and its short form name. Be sure to use the above model, else system will tag it with some other keywords.

Parts of Speech	Meaning of parts of speech
NN	Noun, singular or mass
DT	Determiner
VB	Verb, base form
VBD	Verb, past tense
VBZ	Verb, third person singular present
IN	Preposition or subordinating conjunction
NNP	Proper noun, singular
TO	to
JJ	Adjective

As we did while using the other models, here also we will perform the same steps, where we

Load the models,
Tokenize the given strings
Tag the tokens with the relevant Parts of Speech short name.

Lets go through the code for each steps:

Load the Model

We will use the FileInputStream to load the relevant models. The model is loaded to create a POSModel object.

public POSModel loadModel(){
    try {
        InputStream is = new FileInputStream("src\\main\\resources\\models\\en-pos-maxent.bin");
        POSModel model = new POSModel(is);

        return model;
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
    return null;
}

Tokenize Given String

Using the POSModel object created above, we have to create POSTaggerME object and then tokenize the given string.

POSModel model = loadModel();
POSTaggerME posObj = new POSTaggerME(model);
WhitespaceTokenizer whitespaceTokenizer = WhitespaceTokenizer.INSTANCE;
String[] tokens = whitespaceTokenizer.tokenize(sentence);

Tagging Tokens

Now the above tokens will be tagged and POSSample object will be created. These POSSample will contain the tokens tagged with the short name of the parts of speech. For an input like this

String input = "Welcome to DynamicallyBlunt, for our class of natural language processing.";

We will get an output like this, where each word will tagged with the parts of speech short name.(Given in the above table)

Welcome_UH

to_TO

DynamicallyBlunt,_NNP

for_IN

our_PRP$

class_NN

of_IN

natural_JJ

language_NN

processing._NN

We can also monitor the performance using PerformanceMonitor class.

public void performanceMonitor(POSSample sample){
    PerformanceMonitor monitor = new PerformanceMonitor(System.err, "uploaded");
    monitor.start();
    monitor.incrementCounter();
    monitor.stopAndPrintFinalResult();
}

This will output the below performance metrics:

Average: 0.0 uploaded/s
Total: 1 uploaded
Runtime: 0.0s

You can find the entire code for this post as well for others, in my Github Repository link.

This concludes the series on Apache OpenNLP library. Also, it provides a command line interface to launch it directly from command prompt/terminal and run the specific commands, but we will be conclude our discussion on OpenNLP library now. Let me know your suggestion/feedback in the comments below.

Search This Blog

Tutorials, Errors and Exceptions

Recognizing Parts of Speech in given Sentence using Apache OpenNLP

Load the Model

Tokenize Given String

Tagging Tokens

Comments

Post a Comment

Popular Posts

Scraping Youtube Metadata in Python - Using Requests and Json libraries only

Produce & Consume String Messages using Apache Kafka and Java