Find Names, Location or Time in given string - Named Entity Recognition - Apache OpenNLP

In this series of learning Natural Language Processing with Apache OpenNLP library in Java, we will discuss on NamedEntityRecognition, using which it can identify names, places, locations etc in the given string.

In Apache OpenNLP library, NamedEntityRecognition uses predefined trained models which is used to recognize names or date or places or location etc in the given string. These models are:
  • en-nerdate.bn,
  • en-ner-location.bin
  • en-ner-organization.bin
  • en-ner-person.bin
  • en-ner-time.bin
Like in previous posts on SentenceDetector and Tokenization, here also we have to identify the models which we want to use, load the model, and use the find method of the NameFinderME class to find the specific name, location or place as per the models used.

Lets start with loading the person model:
InputStream is = new FileInputStream("src\\main\\resources\\models\\en-ner-person.bin");
TokenNameFinderModel model = new TokenNameFinderModel(is);

The model is loaded in TokenNameFinderModel class.

Use this model object to create NameFinderME object. The find method of this class will take an array of string as input which will return an array of Span object. This array will have all the matching names, places or location etc as per the models used. 
NameFinderME finder = new NameFinderME(model);
String[] words = text.split(" ");

Span[] spans = finder.find(words);

Use the Span's getStart() and getEnd() methods to get the indexes of the matching words.

So for an input like this:
NamedEntityRecognition ner = new NamedEntityRecognition();
String sentence = "Indian Names could either Abhay or Ankit and other options are Charles and Michael";
ner.nameFinder(sentence);

This will be the output:
[11..12) person---Charles
[13..14) person---Michael

If you want to find the places, location or time, then use the respective models.

Limitation of these Models:

The models for names en-ner-person.bin  will work on a small subset of the name, as I tried with Indian names as given in the input, but it only recognizes the western names. This also is limited to some names.

For the entire codebase, you can visit my Github repository
Also, let me know your feedback and suggestions below in the comment section.

Comments