简体   繁体   English

用java 8计算字数

[英]Word count with java 8

I am trying to implement a word count program in java 8 but I am unable to make it work. 我试图在java 8中实现一个字数统计程序,但我无法使它工作。 The method must take a string as parameter and returns a Map<String,Integer> . 该方法必须将字符串作为参数并返回Map<String,Integer>

When I am doing it in old java way, everthing works fine. 当我以旧java方式进行时,everthing工作正常。 But when I am trying to do it in java 8, it returns a map where the keys are the empty with the correct occurrences. 但是当我尝试在java 8中执行它时,它返回一个映射,其中键是空的,具有正确的出现次数。

Here is my code in a java 8 style : 这是我的java 8风格的代码:

public Map<String, Integer> countJava8(String input){
       return Pattern.compile("(\\w+)").splitAsStream(input).collect(Collectors.groupingBy(e -> e.toLowerCase(), Collectors.reducing(0, e -> 1, Integer::sum)));
    }

Here is the code I would use in a normal situation : 这是我在正常情况下使用的代码:

public Map<String, Integer> count(String input){
        Map<String, Integer> wordcount = new HashMap<>();
        Pattern compile = Pattern.compile("(\\w+)");
        Matcher matcher = compile.matcher(input);

        while(matcher.find()){
            String word = matcher.group().toLowerCase();
            if(wordcount.containsKey(word)){
                Integer count = wordcount.get(word);
                wordcount.put(word, ++count);
            } else {
                wordcount.put(word.toLowerCase(), 1);
            }
        }
        return wordcount;
 }

The main program : 主要方案:

public static void main(String[] args) {
       WordCount wordCount = new WordCount();
       Map<String, Integer> phrase = wordCount.countJava8("one fish two fish red fish blue fish");
       Map<String, Integer> count = wordCount.count("one fish two fish red fish blue fish");

        System.out.println(phrase);
        System.out.println();
        System.out.println(count);
    }

When I run this program, the outputs that I have : 当我运行这个程序时,我的输出:

{ =7, =1}
{red=1, blue=1, one=1, fish=4, two=1}

I thought that the method splitAsStream would stream the matching elements in the regex as Stream . 我认为splitAsStream方法splitAsStream正则表达式中的匹配元素作为Stream How can I correct that? 我怎么能纠正这个?

The problem seems to be that you are in fact splitting by words, ie you are streaming over everything that is not a word, or that is in between words. 问题似乎是你实际上是用语言进行分裂 ,即你是在流过一切不是单词的东西,或者是单词之间 Unfortunately, there seems to be no equivalent method for streaming the actual match results (hard to believe, but I did not find any; feel free to comment if you know one). 不幸的是,似乎没有相同的流式传输实际匹配结果的方法(很难相信,但我没有发现任何;如果你知道一个,请随意评论)。

Instead, you could just split by non-words, using \\W instead of \\w . 相反,你可以只通过非词拆分,使用\\W代替\\w Also, as noted in comments, you can make it a bit more readable by using String::toLowerCase instead of a lambda and Collectors.summingInt . 此外,如在评论中指出,你可以把它多一点用可读String::toLowerCase ,而不是一个拉姆达和Collectors.summingInt

public static Map<String, Integer> countJava8(String input) {
    return Pattern.compile("\\W+")
                  .splitAsStream(input)
                  .collect(Collectors.groupingBy(String::toLowerCase,
                                                 Collectors.summingInt(s -> 1)));
}

But IMHO this is still very hard to comprehend, not only because of the "inverse" lookup, and it's also difficult to generalize to other, more complex patterns. 但恕我直言,这仍然很难理解,不仅仅是因为“反向”查找,而且很难推广到其他更复杂的模式。 Personally, I would just go with the "old school" solution, maybe making it a bit more compact using the new getOrDefault . 就个人而言,我会选择“旧学校”解决方案,也许使用新的getOrDefault变得更紧凑。

public static Map<String, Integer> countOldschool(String input) {
    Map<String, Integer> wordcount = new HashMap<>();
    Matcher matcher = Pattern.compile("\\w+").matcher(input);
    while (matcher.find()) {
        String word = matcher.group().toLowerCase();
        wordcount.put(word, wordcount.getOrDefault(word, 0) + 1);
    }
    return wordcount;
}

The result seems to be the same in both cases. 两种情况下的结果似乎相同。

Try this. 尝试这个。

    String in = "go go go go og sd";
    Map<String, Integer> map = new HashMap<String, Integer>();
    //Replace all punctuation with space
    String[] s = in.replaceAll("\\p{Punct}", " ").split("\\s+");
    for(int i = 0; i < s.length; i++)
    {
        map.put(s[i], i);
    }
    Set<String> st = new HashSet<String>(map.keySet());
    for(int k = 0; k < s.length; k++)
    {
    int i = 0;
    Pattern p = Pattern.compile(s[k]);
    Matcher m = p.matcher(in);
    while (m.find()) {
        i++;
    }
    map.put(s[k], i);
    }
    for(String strin : st)
    {
        System.out.println("String: " + strin.toString() + " - Occurrency: " + map.get(strin.toString()));
    }
    System.out.println("Word: " + s.length);

This is output 这是输出

String: sd, Occurrency: 1 字符串:sd,Occurrency:1

String: go, Occurrency: 4 字符串:go,Occurrency:4

String: og, Occurrency: 1 字符串:og,Occurrency:1

Word: 6 字:6

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM