简体   繁体   English

如何使用weka.core.Stopwords Java类从字符串的每一行中删除所有停用词

[英]How to remove all stopwords from each line of string using the weka.core.Stopwords java class

I have a string in such format 我有这样的格式的字符串

 String wordTyp = "i love to bake you a good sandwitch \\n" + "and i love biscuit and you? \\n"; 

How would I remove every stop words in the line of the strings, using weka.core.Stopwords in java? 如何使用Java中的weka.core.Stopwords删除字符串行中的每个停用词?

 public String removeStopWords(String word,int OriginCount){ Scanner scanner = new Scanner(word); StringBuilder wordDocNoStopWord = new StringBuilder(); String lineOfText =""; int lineCount = 0; Stopwords checker = new Stopwords(); while (scanner.hasNextLine() && lineCount < OriginCount){ lineOfText = scanner.nextLine() + " \\n"; if(checker.is(lineOfText)){/// confirms a stopword in here checker.clear(); ///and clears any stopwords in that line } lineCount++; wordDocNoStopWord.append(new StringBuilder(lineOfText)); System.out.printf(lineOfText); } scanner.close(); return wordDocNoStopWord.toString(); } 

Can you do this: (I dont have access to a compiler so it may require minor fixes) 您可以这样做:(我无权使用编译器,因此可能需要小的修复)

public String removeStopWords(String word,int OriginCount){
String delim = " ";
List<String> list = new ArrayList<String>(Arrays.asList(word.split(delim)));

Stopwords checker = new Stopwords();

for(int i=0; i< list.size(); i++){
        c = list.get(i);
        temp = c.getText();

        if(checker.is(temp)){
            list.remove(i);
            i--;                
        }       
}

String listAsString = "";

for (String temp : list)
{
    listAsString += temp + " ";
}
    return listAsString;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM