如何在Hadoop2中指定Hive查詢的超級化？

Question

Hadoop 2中有一個名為uberization的新功能。 例如，這個引用說：

如果作業足夠小，Uberization可以在ApplicationMaster的JVM中運行MapReduce作業的所有任務。 這樣，您就可以避免從ResourceManager請求容器並要求NodeManagers啟動（假設很小）任務的開銷。

我無法分辨的是，這是否真的在幕后神奇地發生，還是需要為此發生一些事情？ 例如，在進行Hive查詢時是否有設置（或提示）來實現此目的？ 你能指定“足夠小”的門檻嗎？

此外，我很難找到關於這個概念的東西 - 它是否有另一個名字？

Answer 1

我在Arun Murthy的YARN書中找到了關於“超級工作”的細節：

當多個映射器和縮減器組合使用單個容器時，會發生Uber作業。 在表9.3中的mapred-site.xml選項中找到了有關Uber Jobs配置的四個核心設置。

這是表9.3：

|-----------------------------------+------------------------------------------------------------|
| mapreduce.job.ubertask.enable     | Whether to enable the small-jobs "ubertask" optimization,  |
|                                   | which runs "sufficiently small" jobs sequentially within a |
|                                   | single JVM. "Small" is defined by the maxmaps, maxreduces, |
|                                   | and maxbytes settings. Users may override this value.      |
|                                   | Default = false.                                           |
|-----------------------------------+------------------------------------------------------------|
| mapreduce.job.ubertask.maxmaps    | Threshold for the number of maps beyond which the job is   |
|                                   | considered too big for the ubertasking optimization.       |
|                                   | Users may override this value, but only downward.          |
|                                   | Default = 9.                                               |
|-----------------------------------+------------------------------------------------------------|
| mapreduce.job.ubertask.maxreduces | Threshold for the number of reduces beyond which           |
|                                   | the job is considered too big for the ubertasking          |
|                                   | optimization. Currently the code cannot support more       |
|                                   | than one reduce and will ignore larger values. (Zero is    |
|                                   | a valid maximum, however.) Users may override this         |
|                                   | value, but only downward.                                  |
|                                   | Default = 1.                                               |
|-----------------------------------+------------------------------------------------------------|
| mapreduce.job.ubertask.maxbytes   | Threshold for the number of input bytes beyond             |
|                                   | which the job is considered too big for the uber-          |
|                                   | tasking optimization. If no value is specified,            |
|                                   | `dfs.block.size` is used as a default. Be sure to          |
|                                   | specify a default value in `mapred-site.xml` if the        |
|                                   | underlying file system is not HDFS. Users may override     |
|                                   | this value, but only downward.                             |
|                                   | Default = HDFS block size.                                 |
|-----------------------------------+------------------------------------------------------------|

我還不知道是否有特定於Hive的方法來設置它，或者你是否只使用上面的Hive。

Answer 2

當多個映射器和縮減器組合在一起以在Application Master中執行時，就會發生Uber作業。 假設，要執行的作業具有MAX Mappers <= 9; MAX Reducers <= 1 ，然后資源管理器（RM）創建一個Application Master，並使用自己的JVM在Application Master中很好地執行作業。

SET mapreduce.job.ubertask.enable = TRUE;

因此，使用Uberised作業的優點是，Application Master執行的往返開銷，通過向Resource Manager（RM）請求容器以及RM將容器分配給Application master來消除。

如何在Hadoop2中指定Hive查詢的超級化？

問題描述

2 個解決方案

解決方案1
4 2014-06-19 21:45:31

解決方案2
1 2015-04-23 14:35:37

如何在Hadoop2中指定Hive查詢的超級化？

問題描述

2 個解決方案

解決方案1 4 2014-06-19 21:45:31

解決方案2 1 2015-04-23 14:35:37

解決方案1
4 2014-06-19 21:45:31

解決方案2
1 2015-04-23 14:35:37