Given a string representing a sentence like this followed by tagging the string using OpenNLP.
String sentence = "His plays remain highly popular, and are constantly studied.";
I get this below. My question is how do I know apply a regular expression to it to filter out tags? What is throwing me off is the word prepended to each hyphen. If it were just tags I can do something like (VBP|VBN)+
for example, the words in front would vary.
His_PRP$ plays_NNS remain_VBP highly_RB popular,_JJ and_CC are_VBP constantly_RB studied._VBN
For example, how would I write a regular expression to keep all NN
and CC
? So given the tagged string as shown above how do I get plays_NNS and_CC
?
I think you can use regular expressions and extract the desired substrings which matches your pattern and concatenate to get required resultant string.
String text = "His_PRP$ plays_NNS remain_VBP highly_RB popular,_JJ and_CC are_VBP constantly_RB studied._VBN";
String pattern = "([^\\s]+_(NNS|CC))";
String resultText = "";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(text);
while (m.find( ))
{
resultText = resultText + m.group(0) + " ";
}
System.out.println("RESULT: " + resultText);
/*
#### OUTPUT #####
RESULT: plays_NNS and_CC
*/
[^\s]+_(NNS|CC)
This regular expression will help you extract only NNS and CC tags. You can play with the regexp here: https://regex101.com/r/x1VxL0/1
Non-regex solution using a filter method.
public static void main(String []args){
String inputText = "His_PRP$ plays_NNS remain_VBP highly_RB popular,_JJ and_CC are_VBP constantly_RB studied._VBN";
String[] tags = {"_NN", "_CC"};
String[] found = filter(inputText, tags);
for(int i = 0; i < found.length; i++){
System.out.println(found[i]);
}
}
private static String[] filter(String text, String[] tags){
String[] words = text.split(" "); // Split words by spaces
ArrayList<String> results = new ArrayList<String>();
// Save all words that match any of the provided tags
for(String word : words){
for(String tag : tags){
if(word.contains(tag)){
results.add(word);
break;
}
}
}
return results.toArray(new String[0]); // Return results as a string array
}
Prints to the console:
plays_NNS
and_CC
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.