[英]How to split a String into a Stream of Strings?
What is the best method of splitting a String into a Stream?将字符串拆分为流的最佳方法是什么?
I saw these variations:我看到了这些变化:
Arrays.stream("b,l,a".split(","))
Stream.of("b,l,a".split(","))
Pattern.compile(",").splitAsStream("b,l,a")
My priorities are:我的优先事项是:
A complete, compilable example :一个完整的、可编译的例子:
import java.util.Arrays;
import java.util.regex.Pattern;
import java.util.stream.Stream;
public class HelloWorld {
public static void main(String[] args) {
stream1().forEach(System.out::println);
stream2().forEach(System.out::println);
stream3().forEach(System.out::println);
}
private static Stream<String> stream1() {
return Arrays.stream("b,l,a".split(","));
}
private static Stream<String> stream2() {
return Stream.of("b,l,a".split(","));
}
private static Stream<String> stream3() {
return Pattern.compile(",").splitAsStream("b,l,a");
}
}
Arrays.stream
/ String.split
Arrays.stream
/ String.split
Since String.split
returns an array String[]
, I always recommend Arrays.stream
as the canonical idiom for streaming over an array.由于String.split
返回一个数组String[]
,我总是推荐Arrays.stream
作为在数组上流式传输的规范习语。
String input = "dog,cat,bird";
Stream<String> stream = Arrays.stream(input.split( "," ));
stream.forEach(System.out::println);
Stream.of
/ String.split
Stream.of
/ String.split
Stream.of
is a varargs method which just happens to accept an array, due to the fact that varargs methods are implemented via arrays and there were compatibility concerns when varargs were introduced to Java and existing methods retrofitted to accept variable arguments. Stream.of
是一个varargs方法,它恰好接受一个数组,因为 varargs 方法是通过数组实现的,并且当 varargs 被引入 Java 并且现有方法被改造为接受可变参数时存在兼容性问题。
Stream<String> stream = Stream.of(input.split(",")); // works, but is non-idiomatic
Stream<String> stream = Stream.of("dog", "cat", "bird"); // intended use case
Pattern.splitAsStream
Pattern.compile(",").splitAsStream(string)
has the advantage of streaming directly rather than creating an intermediate array. Pattern.compile(",").splitAsStream(string)
具有直接流式传输的优点,而不是创建一个中间数组。 So for a large number of sub-strings, this can have a performance benefit.因此,对于大量子字符串,这可以带来性能优势。 On the other hand, if the delimiter is trivial, ie a single literal character, the String.split
implementation will go through a fast path instead of using the regex engine.另一方面,如果分隔符很简单,即单个文字字符,则String.split
实现将通过快速路径而不是使用正则表达式引擎。 So in this case, the answer is not trivial.所以在这种情况下,答案并非微不足道。
Stream<String> stream = Pattern.compile(",").splitAsStream(input);
If the streaming happens inside another stream, eg .flatMap(Pattern.compile(pattern) ::splitAsStream)
there is the advantage that the pattern has to be analyzed only once, rather than for every string of the outer stream.如果流发生在另一个流内部,例如.flatMap(Pattern.compile(pattern) ::splitAsStream)
则优点是模式只需要分析一次,而不是对外部流的每个字符串进行分析。
Stream<String> stream = Stream.of("a,b", "c,d,e", "f", "g,h,i,j")
.flatMap(Pattern.compile(",")::splitAsStream);
This is a property of method references of the form expression::name
, which will evaluate the expression and capture the result when creating the instance of the functional interface, as explained in What is the equivalent lambda expression for System.out::println and java.lang.NullPointerException is thrown using a method-reference but not a lambda expression这是形式expression::name
的方法引用的属性,它将在创建函数式接口的实例时评估表达式并捕获结果,如System.out::println和什么是等效的 lambda 表达式中所述使用方法引用而不是 lambda 表达式抛出 java.lang.NullPointerException
Regarding (1) and (2) there shouldn't be much difference, as your code is almost the same.关于 (1) 和 (2) 应该没有太大区别,因为您的代码几乎相同。
Regarding (3), that would be much more effective it terms of memory (not necessarily CPU), but in my opinion, a bit harder to read.关于(3),这在内存(不一定是CPU)方面会更有效,但在我看来,阅读起来有点困难。
Robustness稳健性
I can see no difference in the robustness of the three approaches.我看不出这三种方法的稳健性有什么不同。
Readability可读性
I am not aware of any credible scientific studies on code readability involving experienced Java programmers, so readability is a matter of opinion.我不知道有任何关于代码可读性的可靠科学研究涉及有经验的 Java 程序员,所以可读性是一个见仁见智的问题。 Even then, you never know if someone giving their opinion is making an objective distinction between actual readability, what they have been taught about readability, and their own personal taste.即便如此,您也永远不知道发表意见的人是否客观区分了实际可读性、他们所学的可读性以及他们自己的个人品味。
So I will leave it to you to make your own judgements on readability ... noting that you do consider this to be a high priority.所以我会让你自己对可读性做出判断……注意到你确实认为这是一个高优先级。
FWIW, the only people whose opinions on this matter are you and your team. FWIW,唯一对此事发表意见的人是您和您的团队。
Performance性能
I think that the answer to that is to carefully benchmark the three alternatives.我认为这个问题的答案是仔细地对三个备选方案进行基准测试。 Holger provides an analysis based on his study of some versions of Java. Holger 提供了基于他对某些 Java 版本的研究的分析。 But:但是:
Stream
object, what garbage collector you have selected (since the different versions apparently generate different amounts of garbage), and other issues.在实际应用程序中,相对性能还可能取决于您对Stream
对象的处理方式、您选择的垃圾收集器(因为不同版本显然会产生不同数量的垃圾)以及其他问题。So if you (or anyone else) are really concerned with the performance, you should write a micro-benchmark and run it on your production platform(s).因此,如果您(或其他任何人)真的很关心性能,您应该编写一个微基准测试并在您的生产平台上运行它。 Then do some application specific benchmarking.然后进行一些特定于应用程序的基准测试。 And you should consider looking at solutions that don't involve streams.您应该考虑查看不涉及流的解决方案。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.