[英]How to remove all stopwords from each line of string using the weka.core.Stopwords java class
我有這樣的格式的字符串
String wordTyp = "i love to bake you a good sandwitch \\n" + "and i love biscuit and you? \\n";
如何使用Java中的weka.core.Stopwords刪除字符串行中的每個停用詞?
public String removeStopWords(String word,int OriginCount){ Scanner scanner = new Scanner(word); StringBuilder wordDocNoStopWord = new StringBuilder(); String lineOfText =""; int lineCount = 0; Stopwords checker = new Stopwords(); while (scanner.hasNextLine() && lineCount < OriginCount){ lineOfText = scanner.nextLine() + " \\n"; if(checker.is(lineOfText)){/// confirms a stopword in here checker.clear(); ///and clears any stopwords in that line } lineCount++; wordDocNoStopWord.append(new StringBuilder(lineOfText)); System.out.printf(lineOfText); } scanner.close(); return wordDocNoStopWord.toString(); }
您可以這樣做:(我無權使用編譯器,因此可能需要小的修復)
public String removeStopWords(String word,int OriginCount){
String delim = " ";
List<String> list = new ArrayList<String>(Arrays.asList(word.split(delim)));
Stopwords checker = new Stopwords();
for(int i=0; i< list.size(); i++){
c = list.get(i);
temp = c.getText();
if(checker.is(temp)){
list.remove(i);
i--;
}
}
String listAsString = "";
for (String temp : list)
{
listAsString += temp + " ";
}
return listAsString;
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.