簡體 English 中英

Apache Crunch PTable collectValues如何在內部工作

[英]How does Apache Crunch PTable collectValues work internally

原文 2016-04-27 12:45:05 7 1 hadoop/ apache-crunch

我正在瀏覽一些與HDFS架構和Apache緊縮PTable相關的文檔。 根據我的理解，當我們生成PTable時，數據將內部存儲在HDFS中的Data節點之間。

這意味着，如果我有帶有<K1,V1>,<K2,V2>,<K1,V3>,<K3,V4>,<K2,V5>和兩個在HDFS中的數據節點D1和D2的PTable。 假設每個數據節點可容納3對。 因此D1將保持<K1,V1>,<K2,V2>,<K1,V3> ，D2將保持<K3,V4>,<K2,V5> 。

如果我在此PTable上執行collectValues，則在內部運行另一個map-reduce作業以從PTable獲取這些值並生成成對的<K,Collection<V>> 。 所以最后我將有<K1,Collection<V1,V3>>, <K2,Collection<V2,V5>> and <K3,Collection<V4>> 。 同樣，這些對將被分配到不同的數據節點。

現在，我懷疑如何將Collection值(V1,V3 of K1)存儲在生成的PTable中？ 該數據也將分布在各個節點上嗎，即

V1存儲在D1中
V3存儲在D2中

或者，V1和V3僅存儲在一個節點中。

如果密鑰的所有收集值都存儲在一個節點（未分配）中，那么對於大型數據集，對每個密鑰的收集值的處理是否會變慢？

1 個解決方案

同一鍵的所有值都將在一個節點中。 這通常是map reduce的概念-而不是緊縮。 原因是您希望將所有項目放在一個地方-這是您要實現的本地化。

在Apache Crunch中，如何確定PCollection或PTable中是否包含任何元素？如果有多少？

[英]In Apache Crunch, How to find out if a PCollection or PTable has any elements in it? And if so how many?

Apache Crunch管道如何生成地圖減少工作？

[英]How does Apache Crunch pipeline generate map reduce jobs?

Apache Crunch是否隨附Hadoop MapReduce API？

[英]Does Apache Crunch come with the Hadoop MapReduce API?

Apache Crunch：如何創建自定義計數器

[英]Apache Crunch: how to create custom counters

如何在沒有Hadoop的情況下運行Apache Crunch應用程序？

[英]How to run Apache Crunch application without a Hadoop?

如何將配置單元分區讀入Apache Crunch管道？

[英]How to read a hive partition into an Apache Crunch pipeline?

Apache Crunch錯誤

[英]Apache Crunch Error

Apache Crunch無法寫入輸出

[英]Apache crunch unable to write output

使用Apache Crunch進入HBase Standalone的WordCount

[英]WordCount with Apache Crunch into HBase Standalone

Apache Drill如何在Hive上運行？

[英]How does Apache Drill work on top of Hive?

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 在Apache Crunch中，如何確定PCollection或PTable中是否包含任何元素？如果有多少？ Apache Crunch管道如何生成地圖減少工作？ Apache Crunch是否隨附Hadoop MapReduce API？ Apache Crunch：如何創建自定義計數器如何在沒有Hadoop的情況下運行Apache Crunch應用程序？如何將配置單元分區讀入Apache Crunch管道？ Apache Crunch錯誤 Apache Crunch無法寫入輸出使用Apache Crunch進入HBase Standalone的WordCount Apache Drill如何在Hive上運行？

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM