从字符串数组中删除重复的单词

Question

Good Morning 早上好

I write a function that calculates for me the frequency of a term: 我编写了一个为我计算术语出现频率的函数：

public static int tfCalculator(String[] totalterms, String termToCheck) {
    int count = 0;  //to count the overall occurrence of the term termToCheck
    for (String s : totalterms) {
        if (s.equalsIgnoreCase(termToCheck)) {
            count++; 
        }
    } 
    return count;
}

and after that I use it on the code below to calculate every word from a String[] words 之后，我在下面的代码上使用它来计算String[] words每个单词

for(String word:words){
    int freq = tfCalculator(words, word);

    System.out.println(word + "|" + freq);
    mm+=word + "|" + freq+"\n";
}

well the problem that I have is that the words repeat here is for example the result: 好吧，我的问题是重复的单词是例如结果：

cytoskeletal|2 细胞骨架| 2
network|1 网络| 1
enable|1 使| 1
equal|1 等于| 1
spindle|1 主轴| 1
cytoskeletal|2 细胞骨架| 2
... ...
... ...

so can someone help me to remove the repeated word and get as result like that: 所以有人可以帮助我删除重复的单词并得到如下结果：

cytoskeletal|2 细胞骨架| 2
network|1 网络| 1
enable|1 使| 1
equal|1 等于| 1
spindle|1 主轴| 1
... ...
... ...

Thank you very much! 非常感谢你！

Answer 1

Java 8 solution Java 8解决方案

words = Arrays.stream(words).distinct().toArray(String[]::new);

the distinct method removes duplicates. distinct方法删除重复项。 words is replaced with a new array without duplicates words被替换为没有重复的新数组

Answer 2

You can just use a HashSet and that should take care of the duplicates issue: 您可以只使用HashSet ，它应该解决重复项：

words = new HashSet<String>(Arrays.asList(words)).toArray(new String[0]);

This will take your array, convert it to a List , feed that to the constructor of HashSet<String> , and then convert it back to an array for you. 这将获取您的数组，将其转换为List ，将其提供给HashSet<String>的构造函数，然后为您将其转换回数组。

Answer 3

Sort the array, then you can just count equal adjacent elements: 对数组进行排序，然后就可以算出相等的相邻元素：

Arrays.sort(totalterms);
int i = 0;
while (i < totalterms.length) {
  int start = i;
  while (i < totalterms.length && totalterms[i].equals(totalterms[start])) {
    ++i;
  }
  System.out.println(totalterms[start] + "|" + (i - start));
}

Answer 4

I think here you want to print the frequency of each string in the array totalterms . 我认为在这里您要打印数组totalterms中每个字符串的频率。 I think using Map is a easier solution as in the single traversal of the array it will store the frequency of all the strings Check the following implementation. 我认为使用Map是更简单的解决方案，因为在数组的单个遍历中它将存储所有字符串的频率。检查以下实现。

public static void printFrequency(String[] totalterms)
{
    Map frequencyMap = new HashMap<String, Integer>();

    for (String string : totalterms) {
        if(frequencyMap.containsKey(string))
        {
            Integer count = (Integer)frequencyMap.get(string);
            frequencyMap.put(string, count+1);
        }
        else
        {
            frequencyMap.put(string, 1);
        }
    }

    Set <Entry<String, Integer>> elements= frequencyMap.entrySet();

    for (Entry<String, Integer> entry : elements) {
        System.out.println(entry.getKey()+"|"+entry.getValue());
    }
}

Answer 5

in two line : 分两行：



String s = "cytoskeletal|2 - network|1 - enable|1 - equal|1 - spindle|1 - cytoskeletal|2";
System.out.println(new LinkedHashSet(Arrays.asList(s.split("-"))).toString().replaceAll("(^\[|\]$)", "").replace(", ", "- "));

Answer 6

Your code is fine, you just need keep track of which words were encountered already. 您的代码很好，您只需跟踪已遇到的单词。 For that you can keep a running set: 为此，您可以保留运行设置：

Set<String> prevWords = new HashSet<>();
for(String word:words){
    // proceed if word is new to the set, otherwise skip
    if (prevWords.add(word)) {
        int freq = tfCalculator(words, word);

        System.out.println(word + "|" + freq);
        mm+=word + "|" + freq+"\n";
    }
}

从字符串数组中删除重复的单词

问题描述

6 个解决方案

解决方案1
2 2016-03-10 13:50:06

解决方案2
0 2016-03-10 13:45:40

解决方案3
0 2016-03-10 13:51:26

解决方案4
0 已采纳 2016-03-10 14:31:09

解决方案5
0 2016-10-16 08:49:27

解决方案6
0 2016-10-16 09:43:43

从字符串数组中删除重复的单词

问题描述

6 个解决方案

解决方案1 2 2016-03-10 13:50:06

解决方案2 0 2016-03-10 13:45:40

解决方案3 0 2016-03-10 13:51:26

解决方案4 0 已采纳 2016-03-10 14:31:09

解决方案5 0 2016-10-16 08:49:27

解决方案6 0 2016-10-16 09:43:43

解决方案1
2 2016-03-10 13:50:06

解决方案2
0 2016-03-10 13:45:40

解决方案3
0 2016-03-10 13:51:26

解决方案4
0 已采纳 2016-03-10 14:31:09

解决方案5
0 2016-10-16 08:49:27

解决方案6
0 2016-10-16 09:43:43