简体   繁体   English

如何使用 Stream 从 Java 中的字符串中提取三元组

[英]how to extract a trigrams from a string in Java with Stream

Let's say I have a String = "abcabc".假设我有一个 String = "abcabc"。 trigrams are all unique consecutive groups of three letters, in this case trigrams 都是由三个字母组成的唯一连续组,在这种情况下

abc
bca
cab

I would like to extract them and put them inside a Map, ideally counting them.我想提取它们并将它们放入 Map 中,理想情况下计算它们。 So I'll get所以我会得到

abc, 2
bca, 1
cab, 1

I would like to do that with Streams, not with loops.我想用 Streams 来做这件事,而不是用循环。

My idea is我的想法是

  1. convert the string to a HashMap: (index, 3-characters String)将字符串转换为 HashMap: (index, 3-characters String)
  2. collect, group, and count over the stream of the Map values收集、分组和计数 Map 值的 stream

But I can't put it in code...但我不能把它放在代码中......

I came up with something like this:我想出了这样的事情:

        String testString = "abcabc";
        //I can't get point 1...this doesn't compile
        Map trigrams = IntStream.range(0, testString.length()-2).collect(Collectors.toMap(i->i, testString.substring(i,i+3));

        //point 2 seems to work
        //Map<Integer,String> trigrams =  Map.of(0,"abc", 1, "bca",2,"cab",3,"abc");
        
        List<String> trigramsList = trigrams.values().stream().collect(Collectors.toList());
        Map<String, Long> result = trigramsList.stream()
                .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

Also I'm not convinced about IntStream, maybe there is another way.我也不相信 IntStream,也许还有另一种方法。

This can be simplified somewhat.这可以稍微简化。

import java.util.Map;
import java.util.function.Function;
import java.util.stream.IntStream;

import static java.util.stream.Collectors.*;

public class Main {

    public static void main(String[] args) {
        String s = "a";

        Map<String, Long> map = IntStream.range(0, s.length() - 2)
                .mapToObj(i -> s.substring(i, i + 3))
                .collect(groupingBy(Function.identity(), counting()));

        System.out.println(map);
    }
}

But, keep in mind that substring() method will throw an exception if the second index is greater than the length of the string or the second index is less than the first.但是,请记住,如果第二个索引大于字符串的长度或第二个索引小于第一个,则substring()方法将引发异常。 This only works because IntStream.range() generates numbers in ascending order.这仅适用于IntStream.range()以升序生成数字。 So, IntStream.range(0, -2) returns an empty stream.因此, IntStream.range(0, -2)返回一个空的 stream。

It would be better if you explicitly check that there are at least 3 characters in the string.如果您明确检查字符串中至少有 3 个字符会更好。

One possibility would be to use Guava Mutliset :一种可能性是使用 Guava Mutliset

Mutliset trigrams = IntStream.range(0, testString.length()-2)
    .mapToObj(i->testString.substring(i,i+3))
    .collect(Collectors.toCollection(Mutltiset::new));

If you don't want to use Google library, it could be done like so:如果您不想使用 Google 库,可以这样做:

Map trigrams = IntStream.range(0, testString.length()-2)
    .mapToObj(i->testString.substring(i,i+3))
    .collect(Collectors.groupingBy(x->x, Collectors.counting()));

It's very well possible with streams:流很可能:

    String testString = "abcabc";
    Map<String, Long> trigramCnt = 
            // range through index 0-3
            IntStream.rangeClosed(0, testString.length() - 3)
            // create substrings abc (idx 0-2), bca (idx 1-3), cab (idx 2-4), abc (idx 3-5)    
            .mapToObj(i -> testString.substring(i, i + 3))
            // count them
            .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
    System.out.println(trigramCnt);
    //-> {bca=1, abc=2, cab=1}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM