[英]Visualization of Java Stream parallelization
通常,並不十分清楚並行流如何將輸入分成塊以及塊連接的順序。 有沒有辦法可視化任何流源的整個過程,以更好地了解正在發生的事情? 假設我創建了一個這樣的流:
Stream<Integer> stream = IntStream.range(0, 100).boxed().parallel();
我想看到一些樹狀的結構:
[0..99]
_____/ \_____
| |
[0..49] [50..99]
__/ \__ __/ \__
| | | |
[0..24] [25..49] [50..74] [75..99]
這意味着整個輸入范圍[0..99]
被分割為[0..49]
和[50..99]
范圍,這些范圍又分開。 當然這樣的圖應該反映Stream API的實際工作,所以如果我用這樣的流執行一些實際操作,則應該以相同的方式執行拆分。
當前流API實現使用收集器組合器以與先前拆分的方式完全相同的方式組合中間結果。 分裂策略還取決於源和公共池並行度級別,但不依賴於所使用的精確還原操作(對於reduce
, collect
, forEach
, count
等相同)。 依靠這一點,創建可視化收集器並不是很困難:
public static Collector<Object, ?, List<String>> parallelVisualize() {
class Range {
private String first, last;
private Range left, right;
void accept(Object obj) {
if (first == null)
first = obj.toString();
else
last = obj.toString();
}
Range combine(Range that) {
Range p = new Range();
p.first = first == null ? that.first : first;
p.last = Stream
.of(that.last, that.first, this.last, this.first)
.filter(Objects::nonNull).findFirst().orElse(null);
p.left = this;
p.right = that;
return p;
}
String pad(String s, int left, int len) {
if (len == s.length())
return s;
char[] result = new char[len];
Arrays.fill(result, ' ');
s.getChars(0, s.length(), result, left);
return new String(result);
}
public List<String> finish() {
String cur = toString();
if (left == null) {
return Collections.singletonList(cur);
}
List<String> l = left.finish();
List<String> r = right.finish();
int len1 = l.get(0).length();
int len2 = r.get(0).length();
int totalLen = len1 + len2 + 1;
int leftAdd = 0;
if (cur.length() < totalLen) {
cur = pad(cur, (totalLen - cur.length()) / 2, totalLen);
} else {
leftAdd = (cur.length() - totalLen) / 2;
totalLen = cur.length();
}
List<String> result = new ArrayList<>();
result.add(cur);
char[] dashes = new char[totalLen];
Arrays.fill(dashes, ' ');
Arrays.fill(dashes, len1 / 2 + leftAdd + 1, len1 + len2 / 2 + 1
+ leftAdd, '_');
int mid = totalLen / 2;
dashes[mid] = '/';
dashes[mid + 1] = '\\';
result.add(new String(dashes));
Arrays.fill(dashes, ' ');
dashes[len1 / 2 + leftAdd] = '|';
dashes[len1 + len2 / 2 + 1 + leftAdd] = '|';
result.add(new String(dashes));
int maxSize = Math.max(l.size(), r.size());
for (int i = 0; i < maxSize; i++) {
String lstr = l.size() > i ? l.get(i) : String.format("%"
+ len1 + "s", "");
String rstr = r.size() > i ? r.get(i) : String.format("%"
+ len2 + "s", "");
result.add(pad(lstr + " " + rstr, leftAdd, totalLen));
}
return result;
}
public String toString() {
if (first == null)
return "(empty)";
else if (last == null)
return "[" + first + "]";
return "[" + first + ".." + last + "]";
}
}
return Collector.of(Range::new, Range::accept, Range::combine,
Range::finish);
}
這是使用4核機器的這個收集器獲得的一些有趣的結果(結果將在具有不同數量的availableProcessors()
機器上不同)。
拆分簡單范圍 :
IntStream.range(0, 100)
.boxed().parallel().collect(parallelVisualize())
.forEach(System.out::println);
甚至分成16個任務:
[0..99]
___________________________________/\________________________________
| |
[0..49] [50..99]
_________________/\______________ _________________/\________________
| | | |
[0..24] [25..49] [50..74] [75..99]
________/\_____ ________/\_______ ________/\_______ ________/\_______
| | | | | | | |
[0..11] [12..24] [25..36] [37..49] [50..61] [62..74] [75..86] [87..99]
___/\_ ___/\___ ___/\___ ___/\___ ___/\___ ___/\___ ___/\___ ___/\___
| | | | | | | | | | | | | | | |
[0..5] [6..11] [12..17] [18..24] [25..30] [31..36] [37..42] [43..49] [50..55] [56..61] [62..67] [68..74] [75..80] [81..86] [87..92] [93..99]
拆分兩個流串聯 :
IntStream
.concat(IntStream.range(0, 10), IntStream.range(10, 100))
.boxed().parallel().collect(parallelVisualize())
.forEach(System.out::println);
如您所見,首先拆分取消連接流:
[0..99]
_______________________________________________________________________/\_____
| |
[0..9] [10..99]
__/\__ ___________________________________/\__________________________________
| | | |
[0..4] [5..9] [10..54] [55..99]
_________________/\________________ _________________/\________________
| | | |
[10..31] [32..54] [55..76] [77..99]
________/\_______ ________/\_______ ________/\_______ ________/\_______
| | | | | | | |
[10..20] [21..31] [32..42] [43..54] [55..65] [66..76] [77..87] [88..99]
___/\___ ___/\___ ___/\___ ___/\___ ___/\___ ___/\___ ___/\___ ___/\___
| | | | | | | | | | | | | | | |
[10..14] [15..20] [21..25] [26..31] [32..36] [37..42] [43..48] [49..54] [55..59] [60..65] [66..70] [71..76] [77..81] [82..87] [88..93] [94..99]
在串聯之前執行中間操作(boxed())的兩個流連接的拆分 :
Stream.concat(IntStream.range(0, 50).boxed().parallel(), IntStream.range(50, 100).boxed())
.collect(parallelVisualize())
.forEach(System.out::println);
如果其中一個輸入流在連接之前沒有變為並行模式,則它根本拒絕拆分:
[0..99]
___/\_________________________________
| |
[0..49] [50..99]
_________________/\______________
| |
[0..24] [25..49]
________/\_____ ________/\_______
| | | |
[0..11] [12..24] [25..36] [37..49]
___/\_ ___/\___ ___/\___ ___/\___
| | | | | | | |
[0..5] [6..11] [12..17] [18..24] [25..30] [31..36] [37..42] [43..49]
拆分平面圖 :
Stream.of(0, 50)
.flatMap(start -> IntStream.range(start, start+50).boxed().parallel())
.parallel().collect(parallelVisualize())
.forEach(System.out::println);
平面映射從不在嵌套流內並行化:
[0..99]
____/\__
| |
[0..49] [50..99]
來自7000個元素的未知大小的迭代器的流 (請參閱上面的答案 ):
StreamSupport
.stream(Spliterators.spliteratorUnknownSize(
IntStream.range(0, 7000).iterator(),
Spliterator.ORDERED), true)
.collect(parallelVisualize()).forEach(System.out::println);
分裂真的很糟糕,每個人都在等待最大的部分[3072..6143]:
[0..6999]
_______________________/\___
| |
[0..1023] [1024..6999]
________________/\____
| |
[1024..3071] [3072..6999]
_________/\_____
| |
[3072..6143] [6144..6999]
___/\____
| |
[6144..6999] (empty)
已知大小的迭代器源 :
StreamSupport
.stream(Spliterators.spliterator(IntStream.range(0, 7000)
.iterator(), 7000, Spliterator.ORDERED), true)
.collect(parallelVisualize()).forEach(System.out::println);
提供尺寸可以更好地解鎖進一步的分裂:
[0..6999]
______________________________________________________________________________________________/\________
| |
[0..1023] [1024..6999]
_____/\__ ____________________________________________________________________/\________________________
| | | |
[0..511] [512..1023] [1024..3071] [3072..6999]
____________/\___________ ________________/\__________________________________________________
| | | |
[1024..2047] [2048..3071] [3072..6143] [6144..6999]
_____/\_____ _____/\_____ _________________________/\________________________ ___/\___________
| | | | | | | |
[1024..1535] [1536..2047] [2048..2559] [2560..3071] [3072..4607] [4608..6143] [6144..6999] (empty)
____________/\___________ ____________/\___________ _____/\_____
| | | | | |
[3072..3839] [3840..4607] [4608..5375] [5376..6143] [6144..6571] [6572..6999]
_____/\_____ _____/\_____ _____/\_____ _____/\_____
| | | | | | | |
[3072..3455] [3456..3839] [3840..4223] [4224..4607] [4608..4991] [4992..5375] [5376..5759] [5760..6143]
這種收集器的進一步改進可以生成圖形圖像(如svg),跟蹤處理每個節點的線程,顯示每個組的元素數量等等。 如果你願意,可以使用它。
我想通過一個解決方案來增強Tagir的優秀答案 ,該解決方案用於監視源端的分割,甚至是中間操作(當前流API實現強加了一些限制):
public static <E> Stream<E> proxy(Stream<E> src) {
Class<Stream<E>> sClass=(Class)Stream.class;
Class<Spliterator<E>> spClass=(Class)Spliterator.class;
return proxy(src, sClass, spClass, StreamSupport::stream);
}
public static IntStream proxy(IntStream src) {
return proxy(src, IntStream.class, Spliterator.OfInt.class, StreamSupport::intStream);
}
public static LongStream proxy(LongStream src) {
return proxy(src, LongStream.class, Spliterator.OfLong.class, StreamSupport::longStream);
}
public static DoubleStream proxy(DoubleStream src) {
return proxy(src, DoubleStream.class, Spliterator.OfDouble.class, StreamSupport::doubleStream);
}
static final Object EMPTY=new StringBuilder("empty");
static <E,S extends BaseStream<E,S>, Sp extends Spliterator<E>> S proxy(
S src, Class<S> sc, Class<Sp> spc, BiFunction<Sp,Boolean,S> f) {
final class Node<T> implements InvocationHandler,Runnable,
Consumer<Object>, IntConsumer, LongConsumer, DoubleConsumer {
final Class<? extends Spliterator> type;
Spliterator<T> src;
Object first=EMPTY, last=EMPTY;
Node<T> left, right;
Object currConsumer;
public Node(Spliterator<T> src, Class<? extends Spliterator> type) {
this.src = src;
this.type=type;
}
private void value(Object t) {
if(first==EMPTY) first=t;
last=t;
}
public void accept(Object t) {
value(t); ((Consumer)currConsumer).accept(t);
}
public void accept(int t) {
value(t); ((IntConsumer)currConsumer).accept(t);
}
public void accept(long t) {
value(t); ((LongConsumer)currConsumer).accept(t);
}
public void accept(double t) {
value(t); ((DoubleConsumer)currConsumer).accept(t);
}
public void run() {
System.out.println();
finish().forEach(System.out::println);
}
public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
Node<T> curr=this; while(curr.right!=null) curr=curr.right;
if(method.getName().equals("tryAdvance")||method.getName().equals("forEachRemaining")) {
curr.currConsumer=args[0];
args[0]=curr;
}
if(method.getName().equals("trySplit")) {
Spliterator s=curr.src.trySplit();
if(s==null) return null;
Node<T> pfx=new Node<>(s, type);
pfx.left=curr.left; curr.left=pfx;
curr.right=new Node<>(curr.src, type);
src=null;
return pfx.create();
}
return method.invoke(curr.src, args);
}
Object create() {
return Proxy.newProxyInstance(null, new Class<?>[]{type}, this);
}
String pad(String s, int left, int len) {
if (len == s.length())
return s;
char[] result = new char[len];
Arrays.fill(result, ' ');
s.getChars(0, s.length(), result, left);
return new String(result);
}
public List<String> finish() {
String cur = toString();
if (left == null) {
return Collections.singletonList(cur);
}
List<String> l = left.finish();
List<String> r = right.finish();
int len1 = l.get(0).length();
int len2 = r.get(0).length();
int totalLen = len1 + len2 + 1;
int leftAdd = 0;
if (cur.length() < totalLen) {
cur = pad(cur, (totalLen - cur.length()) / 2, totalLen);
} else {
leftAdd = (cur.length() - totalLen) / 2;
totalLen = cur.length();
}
List<String> result = new ArrayList<>();
result.add(cur);
char[] dashes = new char[totalLen];
Arrays.fill(dashes, ' ');
Arrays.fill(dashes, len1 / 2 + leftAdd + 1, len1 + len2 / 2 + 1
+ leftAdd, '_');
int mid = totalLen / 2;
dashes[mid] = '/';
dashes[mid + 1] = '\\';
result.add(new String(dashes));
Arrays.fill(dashes, ' ');
dashes[len1 / 2 + leftAdd] = '|';
dashes[len1 + len2 / 2 + 1 + leftAdd] = '|';
result.add(new String(dashes));
int maxSize = Math.max(l.size(), r.size());
for (int i = 0; i < maxSize; i++) {
String lstr = l.size() > i ? l.get(i) : String.format("%"
+ len1 + "s", "");
String rstr = r.size() > i ? r.get(i) : String.format("%"
+ len2 + "s", "");
result.add(pad(lstr + " " + rstr, leftAdd, totalLen));
}
return result;
}
private Object first() {
if(left==null) return first;
Object o=left.first();
if(o==EMPTY) o=right.first();
return o;
}
private Object last() {
if(right==null) return last;
Object o=right.last();
if(o==EMPTY) o=left.last();
return o;
}
public String toString() {
Object o=first(), p=last();
return o==EMPTY? "(empty)": "["+o+(o!=p? ".."+p+']': "]");
}
}
Node<E> n=new Node<>(src.spliterator(), spc);
Sp sp=(Sp)Proxy.newProxyInstance(null, new Class<?>[]{n.type}, n);
return f.apply(sp, true).onClose(n);
}
它允許使用代理包裝spliterator,該代理將監視拆分操作和遇到的對象。 塊處理的邏輯類似於Tagir,事實上,我復制了他的結果打印例程。
您可以傳入流的源或已附加相同操作的流。 (在后一種情況下,您應盡早將.parallel()
應用於流)。 正如Tagir所解釋的,在大多數情況下,拆分行為取決於源和配置的並行性,因此,在大多數情況下,監視中間狀態可能會更改值,但不會更改已處理的塊:
try(IntStream is=proxy(IntStream.range(0, 100).parallel())) {
is.filter(i -> i/20%2==0)
.mapToObj(ix->"\""+ix+'"')
.forEach(s->{});
}
將打印
[0..99]
___________________________________/\________________________________
| |
[0..49] [50..99]
_________________/\______________ _________________/\________________
| | | |
[0..24] [25..49] [50..74] [75..99]
________/\_____ ________/\_______ ________/\_______ ________/\_______
| | | | | | | |
[0..11] [12..24] [25..36] [37..49] [50..61] [62..74] [75..86] [87..99]
___/\_ ___/\___ ___/\___ ___/\___ ___/\___ ___/\___ ___/\___ ___/\___
| | | | | | | | | | | | | | | |
[0..5] [6..11] [12..17] [18..24] [25..30] [31..36] [37..42] [43..49] [50..55] [56..61] [62..67] [68..74] [75..80] [81..86] [87..92] [93..99]
而
try(Stream<String> s=proxy(IntStream.range(0, 100).parallel().filter(i -> i/20%2==0)
.mapToObj(ix->"\""+ix+'"'))) {
s.forEach(str->{});
}
將打印
["0".."99"]
___________________________________________/\___________________________________________
| |
["0".."49"] ["50".."99"]
____________________/\______________________ ______________________/\___________________
| | | |
["0".."19"] ["40".."49"] ["50".."59"] ["80".."99"]
____________/\_________ ____________/\______ _______/\___________ ____________/\________
| | | | | | | |
["0".."11"] ["12".."19"] (empty) ["40".."49"] ["50".."59"] (empty) ["80".."86"] ["87".."99"]
_____/\___ _____/\_____ ___/\__ _____/\_____ _____/\_____ ___/\__ _____/\__ _____/\_____
| | | | | | | | | | | | | | | |
["0".."5"] ["6".."11"] ["12".."17"] ["18".."19"] (empty) (empty) ["40".."42"] ["43".."49"] ["50".."55"] ["56".."59"] (empty) (empty) ["80"] ["81".."86"] ["87".."92"] ["93".."99"]
正如我們在這里看到的,我們正在監視.filter(…).mapToObj(…)
的結果,但是塊明確地由源確定,可能根據過濾器的條件在下游產生空塊。
請注意,我們可以將源監控與Tagir的收集器監控結合起來:
try(IntStream s=proxy(IntStream.range(0, 100))) {
s.parallel().filter(i -> i/20%2==0)
.boxed().collect(parallelVisualize())
.forEach(System.out::println);
}
這將打印(請注意首先打印collect
輸出):
[0..99]
________________________________/\_______________________________
| |
[0..49] [50..99]
________________/\______________ _______________/\_______________
| | | |
[0..19] [40..49] [50..59] [80..99]
________/\_____ ________/\______ _______/\_______ ________/\_____
| | | | | | | |
[0..11] [12..19] (empty) [40..49] [50..59] (empty) [80..86] [87..99]
___/\_ ___/\___ ___/\__ ___/\___ ___/\___ ___/\__ ___/\_ ___/\___
| | | | | | | | | | | | | | | |
[0..5] [6..11] [12..17] [18..19] (empty) (empty) [40..42] [43..49] [50..55] [56..59] (empty) (empty) [80] [81..86] [87..92] [93..99]
[0..99]
___________________________________/\________________________________
| |
[0..49] [50..99]
_________________/\______________ _________________/\________________
| | | |
[0..24] [25..49] [50..74] [75..99]
________/\_____ ________/\_______ ________/\_______ ________/\_______
| | | | | | | |
[0..11] [12..24] [25..36] [37..49] [50..61] [62..74] [75..86] [87..99]
___/\_ ___/\___ ___/\___ ___/\___ ___/\___ ___/\___ ___/\___ ___/\___
| | | | | | | | | | | | | | | |
[0..5] [6..11] [12..17] [18..24] [25..30] [31..36] [37..42] [43..49] [50..55] [56..61] [62..67] [68..74] [75..80] [81..86] [87..92] [93..99]
我們可以清楚地看到處理的塊如何匹配,但是在過濾之后,一些塊具有較少的元素,其中一些是完全空的。
這是展示的地方,兩種監測方式可以產生顯着差異:
try(DoubleStream is=proxy(DoubleStream.iterate(0, i->i+1)).parallel().limit(100)) {
is.boxed()
.collect(parallelVisualize())
.forEach(System.out::println);
}
[0.0..99.0]
___________________________________________________/\________________________________________________
| |
[0.0..49.0] [50.0..99.0]
_________________________/\______________________ _________________________/\________________________
| | | |
[0.0..24.0] [25.0..49.0] [50.0..74.0] [75.0..99.0]
____________/\_________ ____________/\___________ ____________/\___________ ____________/\___________
| | | | | | | |
[0.0..11.0] [12.0..24.0] [25.0..36.0] [37.0..49.0] [50.0..61.0] [62.0..74.0] [75.0..86.0] [87.0..99.0]
_____/\___ _____/\_____ _____/\_____ _____/\_____ _____/\_____ _____/\_____ _____/\_____ _____/\_____
| | | | | | | | | | | | | | | |
[0.0..5.0] [6.0..11.0] [12.0..17.0] [18.0..24.0] [25.0..30.0] [31.0..36.0] [37.0..42.0] [43.0..49.0] [50.0..55.0] [56.0..61.0] [62.0..67.0] [68.0..74.0] [75.0..80.0] [81.0..86.0] [87.0..92.0] [93.0..99.0]
[0.0..10239.0]
_____________________________/\_____
| |
[0.0..1023.0] [1024.0..10239.0]
____________________/\_______
| |
[1024.0..3071.0] [3072.0..10239.0]
____________/\______
| |
[3072.0..6143.0] [6144.0..10239.0]
___/\_______
| |
[6144.0..10239.0] (empty)
這證明了Tagir已經解釋過的 ,未知大小的流分裂得很差,甚至limit(…)
提供了良好估計的可能性(實際上,無限+限制在理論上是可預測的),實現沒有任何優勢它的。
使用1024
的批量大小將源拆分為塊,在每次拆分后增加1024
,創建超出limit
范圍的塊。 我們還可以看到每次分離前綴的方式。
但是當我們查看終端分割輸出時,我們可以看到這些多余的塊之間已經被丟棄,並且第一個塊的另一個分裂已經發生。 由於這個塊是由第一個拆分中的默認實現填充的中間數組的后端,我們在源代碼處沒有注意到它,但我們可以在終端操作中看到該數組已被拆分(不出所料)很平衡。
所以我們需要兩種監控方式來全面了解......
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.