繁体   English   中英

Google Guava MultiSet返回不正确的值

[英]Google Guava MultiSet returning incorrect value

我正在使用Google Guava API计算字数。

public static void main(String args[])
    {
        String txt = "Lemurs of Madagascar is a reference work and field guide giving descriptions and biogeographic data for all the known lemur species in Madagascar (ring-tailed lemur pictured). It also provides general information about lemurs and their history and helps travelers identify species they may encounter. The primary contributor is Russell Mittermeier, president of Conservation International. The first edition in 1994 received favorable reviews for its meticulous coverage, numerous high-quality illustrations, and engaging discussion of lemur topics, including conservation, evolution, and the recently extinct subfossil lemurs. The American Journal of Primatology praised the second edition's updates and enhancements. Lemur News appreciated the expanded content of the third edition (2010), but was concerned that it was not as portable as before. The first edition identified 50 lemur species and subspecies, compared to 71 in the second edition and 101 in the third. The taxonomy promoted by these books has been questioned by some researchers who view these growing numbers of lemur species as insufficiently justified inflation of species numbers.";

        Iterable<String> result = Splitter.on(" ").trimResults(CharMatcher.DIGIT)
                   .omitEmptyStrings().split(txt);
        Multiset<String> words = HashMultiset.create(result);

        for(Multiset.Entry<String> entry : words.entrySet())
        {
            String word = entry.getElement();
            int count = words.count(word);
            System.out.printf("%S %d", word, count);
            System.out.println();
        }
    }

输出应为

Lemurs 3

但是我越来越像这样:

Lemurs 1
Lemurs 1
Lemurs 1

我究竟做错了什么?

使用printf("%S %d", words, count)以大写S兽皮的细节,该字的不同资本化“狐猴”被分开计数的。 当我运行该程序时,我看到

  • 发生“ lemurs”。 期间未修剪
  • 一小部分“ lemurs”全部小写
  • 出现“狐猴”,首字母大写

MultiSet工作正常。 仔细看看您的结果-将printf切换为例如"|%S| %d"将会有所帮助:

|lemurs.| 1
|lemurs| 1
|Lemurs| 1

显而易见,这些都是3个不同的字符串。 在这种情况下,解决方案是简单地剥离所有非字母字符,并将所有单词小写。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM