在文本文件中查找唯一的单词

Question

I'm writing this program in Java to find the unique words in a text file. 我正在用Java编写这个程序来查找文本文件中的唯一单词。 I want to know if this code is correct as it shows even spaces as words. 我想知道这段代码是否正确，因为它甚至将空格显示为单词。

String[] words;
List<String> uniqueWords = new ArrayList<String>();
words = str1.split("[!-~]* ");
for (int i = 0; i < words.length; i++)
{
    if (!(uniqueWords.contains (words[i])))
    {
        uniqueWords.add(words[i]);
    }
}

For example, if my input is "Hello world! How is the world?" 例如，如果我的输入是“Hello world！世界怎么样？” my output array/set/list should have hello, world, how, is, the 我的输出数组/设置/列表应该有hello，world，how，is，the

Answer 1

You can find unique words by using a Set . 您可以使用Set找到唯一的单词。 Set is a Collection which contains no duplicate elements. Set是一个不包含重复元素的Collection。

String[] words;
Set<String> uniqueWords = new HashSet<String>();
words = str1.split("[\\W]+");
for (int i = 0; i < words.length; i++)
{
    uniqueWords.add(words[i]);
}

Answer 2

Slightly modified version of other answers (I like it short and simple): 其他答案的略微修改版本（我喜欢它简短而简单）：

String[] words = str1.split("[!-~]* ");
Set<String> uniqueWords = new HashSet<String>();

for (String word : words) {
    uniqueWords.add(word);
}

Note: if you want to split on ! 注意：如果你想拆分! or - or ~ or space, you should use this: 或-或~或空格，你应该使用这个：

String[] words = str1.split("[-!~\\s]+");

(the dash must be first or last) （短划线必须是第一个或最后一个）

Answer 3

如果我们真的想要紧凑：

Set<String> unique = new HashSet<String>(Arrays.asList(str.toLowerCase().split("[-.,:;?!~\\s]+")));

Answer 4

Set does not allow duplicates where as List allows duplicates. Set不允许重复，因为List允许重复。

String[] words;
Set<String> uniqueWords = new HashSet<String>();
words = str1.split("[!-~]* ");
for (int i = 0; i < words.length; i++)
    uniqueWords.add(words[i]); //Here you need not to check with set because it wont allow duplicates

Answer 5

I'd suggest you use pattern and matchers and drop the result in a Set. 我建议你使用模式和匹配器，并将结果放入Set中。

public void getWords()
{
    Set<String> words = new HashSet<String>();
    String pattern = "[a-zA-Z]+\\s";
    String match = "hello world how* are. you! world hello";
    Pattern compile = Pattern.compile(pattern);
    Matcher matcher = compile.matcher(match);
    while(matcher.find())
    {
        String group = matcher.group();
        boolean add = words.add(group);
        if(add)
        {
            System.out.println(group);
        }
    }
}

Output: 输出：

hello 
world

Change your definition of what a 'word' is by changing the pattern. 通过更改模式来更改“单词”的定义。

Answer 6

In case if you would like to get the words that have not been duplicated in the sentence/any sort of text, You can try this: 如果您想获得句子/任何类型的文本中没有重复的单词，您可以尝试这样做：

   public static Map<String,Integer> getUniqueWords(String sentence){
   String[] word = sentence.split("[\\W]+");
        Map<String,Integer> uniqueWord = new HashMap<>();
        for (String e:word){
            if(!uniqueWord.containsKey(e)){
                uniqueWord.put(e,1);
            }else{
                uniqueWord.remove(e);
            }
        }
        return uniqueWord;
    }

在文本文件中查找唯一的单词

问题描述

6 个解决方案

解决方案1
5 2013-03-20 11:32:26

解决方案2
4 已采纳 2013-03-20 11:40:28

解决方案3
2 2013-03-20 11:50:06

解决方案4
1 2013-03-20 11:33:54

解决方案5
0 2013-03-20 11:40:57

解决方案6
0 2017-07-25 03:42:49

在文本文件中查找唯一的单词

问题描述

6 个解决方案

解决方案1 5 2013-03-20 11:32:26

解决方案2 4 已采纳 2013-03-20 11:40:28

解决方案3 2 2013-03-20 11:50:06

解决方案4 1 2013-03-20 11:33:54

解决方案5 0 2013-03-20 11:40:57

解决方案6 0 2017-07-25 03:42:49

解决方案1
5 2013-03-20 11:32:26

解决方案2
4 已采纳 2013-03-20 11:40:28

解决方案3
2 2013-03-20 11:50:06

解决方案4
1 2013-03-20 11:33:54

解决方案5
0 2013-03-20 11:40:57

解决方案6
0 2017-07-25 03:42:49