使用 Java Streams 返回單詞出現的句子的計數和列表

Question

我被困在試圖了解每個單詞出現的句子中。 該條目將是一個句子列表

Question, what kind of wine is best? 
White wine.
A question

和 output 將是

// format would be: word:{count: sentence1, sentence2,...}
a:{1:3} 
wine:{2:1,2} 
best:{1:1} 
is:{1:1} 
kind:{1:1} 
of:{1:1} 
question:{2:1,3} 
what:{1:1}
white:{1:2}

這是我到目前為止得到的：

static void getFrequency(List<String> inputLines) {
  List<String> list = inputLines.stream()
     .map(w -> w.split("[^a-zA-Z0-9]+"))
     .flatMap(Arrays::stream)
     .map(String::toLowerCase)
     .collect(Collectors.toList());

   Map<String, Integer> wordCounter = list.stream()
     .collect(Collectors.toMap(w -> w, w -> 1, Integer::sum));
}

這樣，我只能計算每個單詞在所有句子中出現的次數，但我還需要獲取該單詞出現的句子列表。 看起來可能是為了獲取我可以使用IntStream.range的句子的 id，如下所示：

 IntStream.range(1, inputLines.size())
          .mapToObj(i -> inputLines.get(i));

但我不確定這是否是最好的方法，我是 Java 的新手

Answer 1

您可以使用分組收集器計算單詞以索引列表 map。 這是一個例子：

private static Map<String, List<Integer>> getFrequency(List<String> inputLines) {
    return IntStream.range(0, inputLines.size())
            .mapToObj(line -> Arrays.stream(inputLines.get(line)
                 .split("[^a-zA-Z0-9]+"))
                 .map(word -> new SimpleEntry<>(word.toLowerCase(), line + 1)))
            .flatMap(Function.identity())
            .collect(Collectors.groupingBy(Entry::getKey, 
                  Collectors.mapping(Entry::getValue, Collectors.toList())));
}

有了你的測試數據，我得到

{a=[3], what=[1], white=[2], question=[1, 3], kind=[1], 
 of=[1], best=[1], is=[1], wine=[1, 2]}

計數很容易從列表大小中推斷出來，因此不需要額外的 class。

使用 Java Streams 返回單詞出現的句子的計數和列表

問題描述

1 個解決方案

解決方案1
9 已采納 2021-04-09 16:03:08

使用 Java Streams 返回單詞出現的句子的計數和列表

問題描述

1 個解決方案

解決方案1 9 已采納 2021-04-09 16:03:08

解決方案1
9 已采納 2021-04-09 16:03:08