删除文件传递的 JAVA 中的停用词

Question

I have to get some StopWords from a txt file and removing them from a text.我必须从 txt 文件中获取一些停止词并从文本中删除它们。 I get the StopWords from the File with this method, saving them in a String array and returning:我使用此方法从文件中获取停止词，将它们保存在字符串数组中并返回：

public String[] loadStopwords(File targetFile, String[] stopWords) throws IOException {

    File fileTo = new File(targetFile.toString());
    BufferedReader br;
    List<String> lines = new ArrayList<String>();

    try {
            br = new BufferedReader(new FileReader(fileTo));
            String st;
                while((st=br.readLine()) != null){
                    lines.add(st);
                }
    } catch (FileNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    stopWords = lines.toArray(new String[]{});
    return stopWords;

}

Then, I pass the StopWords[] and the text to update inside this:然后，我传递 StopWords[] 和要在其中更新的文本：

public void removeStopWords(String targetText, String[] stopwords) {
    targetText = targetText.toLowerCase().trim();

    ArrayList<String> wordList = new ArrayList<>();
    wordList.addAll(Arrays.asList(targetText.split(" ")));

    List<String> stopWordsList = new ArrayList<>();
    stopWordsList.addAll(Arrays.asList(stopwords));

    wordList.removeAll(stopWordsList);

}

But nothing is removed from wordList .但是没有从wordList 中删除任何内容。 Why?为什么？

Answer 1

Try to also save the stopWords in lowercase :尝试同时将停用词保存为小写：

public  String[] loadStopwords(String targetFile) throws IOException {
    File fileTo = new File(targetFile);
    BufferedReader br;
    List<String> lines = new ArrayList<>();
    try {
        br = new BufferedReader(new FileReader(fileTo));
        String st;
        while((st=br.readLine()) != null){
            //Adding words en lowercase and without start end blanks
            lines.add(st.toLowerCase().trim);
        }
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    }

    return lines.toArray(new String[]{});
}

public  ArrayList<String> removeStopWords(String targetText, String[] stopwords) {
    //Make the text to LowerCase also
    targetText = targetText.toLowerCase().trim();

    ArrayList<String> wordList = new ArrayList<>();
    wordList.addAll(Arrays.asList(targetText.split(" ")));

    List<String> stopWordsList = new ArrayList<>();
    stopWordsList.addAll(Arrays.asList(stopwords));

    wordList.removeAll(stopWordsList);

    return wordList;
}

Answer 2

Edoardo爱德华多

That does work for me.这对我有用。 But, there are a few comments:但是，有一些评论：

You don't use the stopWords argument in the loadStopWords method.您不在 loadStopWords 方法中使用 stopWords 参数。
You aren't returning wordList from the removeStopWords method.您不是从 removeStopWords 方法返回 wordList 。

Looking at your comments, I suspect the difference is in the stop words text file.查看您的评论，我怀疑区别在于停用词文本文件。 I had mine with each stop word on a new line, whereas you most likely have all stop words on a single line, which you are not separating out.我的每个停用词都在一个新行上，而您很可能将所有停用词都放在一行中，而您没有将它们分开。

删除文件传递的 JAVA 中的停用词

问题描述

2 个解决方案

解决方案1
1 2019-07-02 12:12:05

解决方案2
0 2019-07-02 11:59:44

删除文件传递的 JAVA 中的停用词

问题描述

2 个解决方案

解决方案1 1 2019-07-02 12:12:05

解决方案2 0 2019-07-02 11:59:44

解决方案1
1 2019-07-02 12:12:05

解决方案2
0 2019-07-02 11:59:44