简体   繁体   English

如何将字符串拆分为字符串流?

[英]How to split a String into a Stream of Strings?

What is the best method of splitting a String into a Stream?将字符串拆分为流的最佳方法是什么?

I saw these variations:我看到了这些变化:

  1. Arrays.stream("b,l,a".split(","))
  2. Stream.of("b,l,a".split(","))
  3. Pattern.compile(",").splitAsStream("b,l,a")

My priorities are:我的优先事项是:

  • Robustness稳健性
  • Readability可读性
  • Performance性能

A complete, compilable example :一个完整的、可编译的例子

import java.util.Arrays;
import java.util.regex.Pattern;
import java.util.stream.Stream;

public class HelloWorld {

    public static void main(String[] args) {
        stream1().forEach(System.out::println);
        stream2().forEach(System.out::println);
        stream3().forEach(System.out::println);
    }

    private static Stream<String> stream1() {
        return Arrays.stream("b,l,a".split(","));
    }

    private static Stream<String> stream2() {
        return Stream.of("b,l,a".split(","));
    }

    private static Stream<String> stream3() {
        return Pattern.compile(",").splitAsStream("b,l,a");
    }

}

Arrays.stream / String.split Arrays.stream / String.split

Since String.split returns an array String[] , I always recommend Arrays.stream as the canonical idiom for streaming over an array.由于String.split返回一个数组String[] ,我总是推荐Arrays.stream作为在数组上流式传输的规范习语。

String input = "dog,cat,bird";
Stream<String> stream = Arrays.stream(input.split( "," ));
stream.forEach(System.out::println);

Stream.of / String.split Stream.of / String.split

Stream.of is a varargs method which just happens to accept an array, due to the fact that varargs methods are implemented via arrays and there were compatibility concerns when varargs were introduced to Java and existing methods retrofitted to accept variable arguments. Stream.of是一个varargs方法,它恰好接受一个数组,因为 varargs 方法是通过数组实现的,并且当 varargs 被引入 Java 并且现有方法被改造为接受可变参数时存在兼容性问题。

Stream<String> stream = Stream.of(input.split(","));     // works, but is non-idiomatic
Stream<String> stream = Stream.of("dog", "cat", "bird"); // intended use case

Pattern.splitAsStream

Pattern.compile(",").splitAsStream(string) has the advantage of streaming directly rather than creating an intermediate array. Pattern.compile(",").splitAsStream(string)具有直接流式传输的优点,而不是创建一个中间数组。 So for a large number of sub-strings, this can have a performance benefit.因此,对于大量子字符串,这可以带来性能优势。 On the other hand, if the delimiter is trivial, ie a single literal character, the String.split implementation will go through a fast path instead of using the regex engine.另一方面,如果分隔符很简单,即单个文字字符,则String.split实现将通过快速路径而不是使用正则表达式引擎。 So in this case, the answer is not trivial.所以在这种情况下,答案并非微不足道。

Stream<String> stream = Pattern.compile(",").splitAsStream(input);

If the streaming happens inside another stream, eg .flatMap(Pattern.compile(pattern) ::splitAsStream) there is the advantage that the pattern has to be analyzed only once, rather than for every string of the outer stream.如果流发生在另一个流内部,例如.flatMap(Pattern.compile(pattern) ::splitAsStream)则优点是模式只需要分析一次,而不是对外部流的每个字符串进行分析。

Stream<String> stream = Stream.of("a,b", "c,d,e", "f", "g,h,i,j")
    .flatMap(Pattern.compile(",")::splitAsStream);

This is a property of method references of the form expression::name , which will evaluate the expression and capture the result when creating the instance of the functional interface, as explained in What is the equivalent lambda expression for System.out::println and java.lang.NullPointerException is thrown using a method-reference but not a lambda expression这是形式expression::name的方法引用的属性,它将在创建函数式接口的实例时评估表达式并捕获结果,如System.out::println什么是等效的 lambda 表达式中所述使用方法引用而不是 lambda 表达式抛出 java.lang.NullPointerException

Regarding (1) and (2) there shouldn't be much difference, as your code is almost the same.关于 (1) 和 (2) 应该没有太大区别,因为您的代码几乎相同。
Regarding (3), that would be much more effective it terms of memory (not necessarily CPU), but in my opinion, a bit harder to read.关于(3),这在内存(不一定是CPU)方面会更有效,但在我看来,阅读起来有点困难。

Robustness稳健性

I can see no difference in the robustness of the three approaches.我看不出这三种方法的稳健性有什么不同。

Readability可读性

I am not aware of any credible scientific studies on code readability involving experienced Java programmers, so readability is a matter of opinion.我不知道有任何关于代码可读性的可靠科学研究涉及有经验的 Java 程序员,所以可读性是一个见仁见智的问题。 Even then, you never know if someone giving their opinion is making an objective distinction between actual readability, what they have been taught about readability, and their own personal taste.即便如此,您也永远不知道发表意见的人是否客观区分了实际可读性、他们所学的可读性以及他们自己的个人品味。

So I will leave it to you to make your own judgements on readability ... noting that you do consider this to be a high priority.所以我会让你自己对可读性做出判断……注意到你确实认为这是一个高优先级。

FWIW, the only people whose opinions on this matter are you and your team. FWIW,唯一对此事发表意见的人是您和您的团队。

Performance性能

I think that the answer to that is to carefully benchmark the three alternatives.我认为这个问题的答案是仔细地对三个备选方案进行基准测试。 Holger provides an analysis based on his study of some versions of Java. Holger 提供了基于他对某些 Java 版本的研究的分析。 But:但是:

  1. He was not able to come to a definite conclusion on which was fastest.至于哪个最快,他无法得出明确的结论。
  2. Strictly speaking, his analysis only applies to the versions of Java he looked at.严格来说,他的分析只适用于他看过的Java版本。 (Some aspects of his analysis could be different on (say) Android Java, or some future Oracle / OpenJDK version.) (他的分析的某些方面可能在(例如)Android Java 或某些未来的 Oracle / OpenJDK 版本上有所不同。)
  3. The relative performance is likely depend on the length of the string being split, the number of fields, and the complexity of the separator regex.相对性能可能取决于要拆分的字符串的长度、字段数和分隔符正则表达式的复杂性。
  4. In a real application, the relative performance may also depend what you do with the Stream object, what garbage collector you have selected (since the different versions apparently generate different amounts of garbage), and other issues.在实际应用程序中,相对性能还可能取决于您对Stream对象的处理方式、您选择的垃圾收集器(因为不同版本显然会产生不同数量的垃圾)以及其他问题。

So if you (or anyone else) are really concerned with the performance, you should write a micro-benchmark and run it on your production platform(s).因此,如果您(或其他任何人)真的很关心性能,您应该编写一个微基准测试并在您的生产平台上运行它。 Then do some application specific benchmarking.然后进行一些特定于应用程序的基准测试。 And you should consider looking at solutions that don't involve streams.您应该考虑查看不涉及流的解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM