[英]How to remove stopwords from a sentence?
I have a list of stopwords where I want to remove all stopwords that exist in a sentence from the stopword-list.我有一个停用词列表,我想从停用词列表中删除句子中存在的所有停用词。 I'm currently using regex.
我目前正在使用正则表达式。 I have to convert it to lower case as per the requirements that i need to meet.
我必须根据我需要满足的要求将其转换为小写。
However, the problem is that stopwords still exists in the sentence.但是,问题在于句子中仍然存在停用词。
// List of stopwords
List<String> stopwords = new ArrayList<>();
stopwords.add("is");
stopwords.add("a");
// the stopword list goes on ....
// Sentence
String sentence = "autism autism is a neurodevelopmental";
// Remove stop words in the sentence
String stopwordsRegex = stopwords.stream().collect(Collectors.joining("|", "\\b(", ")\\b\\s?"));
String removedSW = sentence.toLowerCase().replaceAll(stopwordsRegex, "");
System.out.println(removedSW);
String stopwordsRegex = stopwords.stream()
.map(String::toLowerCase)
.collect(Collectors.joining("|", "(?i)\\b(", ")\\b\\s?"));
String removedSW = sentence.replaceAll(stopwordsRegex, "");
Everything is fine, just (?i)
will add an ignore-case , so the sentence may keep its upper case.一切都很好,只是
(?i)
会添加一个ignore-case ,所以句子可能会保持大写。 It might have been an upper-case stop word like "I"
.它可能是一个大写的停用词,例如
"I"
。 How to make words in a stream lower-case added (but not necessary).如何将流中的单词添加为小写(但不是必需的)。
this works as well:这也有效:
for (String stopword : stopwords){
sentence = sentence.replaceAll("\\b" + stopword + "\\b", "");
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.