簡體   English   中英

Google Guava MultiSet返回不正確的值

[英]Google Guava MultiSet returning incorrect value

我正在使用Google Guava API計算字數。

public static void main(String args[])
    {
        String txt = "Lemurs of Madagascar is a reference work and field guide giving descriptions and biogeographic data for all the known lemur species in Madagascar (ring-tailed lemur pictured). It also provides general information about lemurs and their history and helps travelers identify species they may encounter. The primary contributor is Russell Mittermeier, president of Conservation International. The first edition in 1994 received favorable reviews for its meticulous coverage, numerous high-quality illustrations, and engaging discussion of lemur topics, including conservation, evolution, and the recently extinct subfossil lemurs. The American Journal of Primatology praised the second edition's updates and enhancements. Lemur News appreciated the expanded content of the third edition (2010), but was concerned that it was not as portable as before. The first edition identified 50 lemur species and subspecies, compared to 71 in the second edition and 101 in the third. The taxonomy promoted by these books has been questioned by some researchers who view these growing numbers of lemur species as insufficiently justified inflation of species numbers.";

        Iterable<String> result = Splitter.on(" ").trimResults(CharMatcher.DIGIT)
                   .omitEmptyStrings().split(txt);
        Multiset<String> words = HashMultiset.create(result);

        for(Multiset.Entry<String> entry : words.entrySet())
        {
            String word = entry.getElement();
            int count = words.count(word);
            System.out.printf("%S %d", word, count);
            System.out.println();
        }
    }

輸出應為

Lemurs 3

但是我越來越像這樣:

Lemurs 1
Lemurs 1
Lemurs 1

我究竟做錯了什么?

使用printf("%S %d", words, count)以大寫S獸皮的細節,該字的不同資本化“狐猴”被分開計數的。 當我運行該程序時,我看到

  • 發生“ lemurs”。 期間未修剪
  • 一小部分“ lemurs”全部小寫
  • 出現“狐猴”,首字母大寫

MultiSet工作正常。 仔細看看您的結果-將printf切換為例如"|%S| %d"將會有所幫助:

|lemurs.| 1
|lemurs| 1
|Lemurs| 1

顯而易見,這些都是3個不同的字符串。 在這種情況下,解決方案是簡單地剝離所有非字母字符,並將所有單詞小寫。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM