[英]remove repeated words from String Array
Good Morning 早上好
I write a function that calculates for me the frequency of a term: 我编写了一个为我计算术语出现频率的函数:
public static int tfCalculator(String[] totalterms, String termToCheck) {
int count = 0; //to count the overall occurrence of the term termToCheck
for (String s : totalterms) {
if (s.equalsIgnoreCase(termToCheck)) {
count++;
}
}
return count;
}
and after that I use it on the code below to calculate every word from a String[] words
之后,我在下面的代码上使用它来计算
String[] words
每个单词
for(String word:words){
int freq = tfCalculator(words, word);
System.out.println(word + "|" + freq);
mm+=word + "|" + freq+"\n";
}
well the problem that I have is that the words repeat here is for example the result: 好吧,我的问题是重复的单词是例如结果:
so can someone help me to remove the repeated word and get as result like that: 所以有人可以帮助我删除重复的单词并得到如下结果:
Thank you very much! 非常感谢你!
Java 8 solution Java 8解决方案
words = Arrays.stream(words).distinct().toArray(String[]::new);
the distinct
method removes duplicates. distinct
方法删除重复项。 words
is replaced with a new array without duplicates words
被替换为没有重复的新数组
You can just use a HashSet
and that should take care of the duplicates issue: 您可以只使用
HashSet
,它应该解决重复项:
words = new HashSet<String>(Arrays.asList(words)).toArray(new String[0]);
This will take your array, convert it to a List
, feed that to the constructor of HashSet<String>
, and then convert it back to an array for you. 这将获取您的数组,将其转换为
List
,将其提供给HashSet<String>
的构造函数,然后为您将其转换回数组。
Sort the array, then you can just count equal adjacent elements: 对数组进行排序,然后就可以算出相等的相邻元素:
Arrays.sort(totalterms);
int i = 0;
while (i < totalterms.length) {
int start = i;
while (i < totalterms.length && totalterms[i].equals(totalterms[start])) {
++i;
}
System.out.println(totalterms[start] + "|" + (i - start));
}
I think here you want to print the frequency of each string in the array totalterms . 我认为在这里您要打印数组totalterms中每个字符串的频率。 I think using Map is a easier solution as in the single traversal of the array it will store the frequency of all the strings Check the following implementation.
我认为使用Map是更简单的解决方案,因为在数组的单个遍历中它将存储所有字符串的频率。检查以下实现。
public static void printFrequency(String[] totalterms)
{
Map frequencyMap = new HashMap<String, Integer>();
for (String string : totalterms) {
if(frequencyMap.containsKey(string))
{
Integer count = (Integer)frequencyMap.get(string);
frequencyMap.put(string, count+1);
}
else
{
frequencyMap.put(string, 1);
}
}
Set <Entry<String, Integer>> elements= frequencyMap.entrySet();
for (Entry<String, Integer> entry : elements) {
System.out.println(entry.getKey()+"|"+entry.getValue());
}
}
in two line : 分两行:
String s = "cytoskeletal|2 - network|1 - enable|1 - equal|1 - spindle|1 - cytoskeletal|2"; System.out.println(new LinkedHashSet(Arrays.asList(s.split("-"))).toString().replaceAll("(^\[|\]$)", "").replace(", ", "- "));
Your code is fine, you just need keep track of which words were encountered already. 您的代码很好,您只需跟踪已遇到的单词。 For that you can keep a running set:
为此,您可以保留运行设置:
Set<String> prevWords = new HashSet<>();
for(String word:words){
// proceed if word is new to the set, otherwise skip
if (prevWords.add(word)) {
int freq = tfCalculator(words, word);
System.out.println(word + "|" + freq);
mm+=word + "|" + freq+"\n";
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.