简体   繁体   English

Amazon Elastic MapReduce 引导操作不工作

[英]Amazon Elastic MapReduce Bootstrap Actions not working

I have tried the following combinations of bootstrap actions to increase the heap size of my job but none of them seem to work:我尝试了以下引导操作组合来增加我的工作的堆大小,但它们似乎都不起作用:

--mapred-key-value mapred.child.java.opts=-Xmx1024m 
--mapred-key-value mapred.child.ulimit=unlimited

--mapred-key-value mapred.map.child.java.opts=-Xmx1024m 
--mapred-key-value mapred.map.child.ulimit=unlimited

-m mapred.map.child.java.opts=-Xmx1024m
-m mapred.map.child.ulimit=unlimited 

-m mapred.child.java.opts=-Xmx1024m 
-m mapred.child.ulimit=unlimited 

What is the right syntax?什么是正确的语法?

You have two options to achieve this:您有两种选择来实现这一目标:

Custom JVM Settings自定义 JVM 设置

In order to apply custom settings, You might want to have a look at the Bootstrap Actions documentation for Amazon Elastic MapReduce (Amazon EMR) , specifically action Configure Daemons :为了应用自定义设置,您可能需要查看Amazon Elastic MapReduce (Amazon EMR)Bootstrap Actions文档,特别是操作Configure Daemons

This predefined bootstrap action lets you specify the heap size or other Java Virtual Machine (JVM) options for the Hadoop daemons.这个预定义的引导操作允许您为 Hadoop 守护程序指定堆大小或其他 Java 虚拟机 (JVM) 选项。 You can use this bootstrap action to configure Hadoop for large jobs that require more memory than Hadoop allocates by default.您可以使用此引导操作为需要比 Hadoop 默认分配更多的 memory 的大型作业配置 Hadoop。 You can also use this bootstrap action to modify advanced JVM options, such as garbage collection behavior.您还可以使用此引导操作来修改高级 JVM 选项,例如垃圾收集行为。

An example is provided as well, which sets the heap size to 2048 and configures the Java namenode option :还提供了一个示例,它将堆大小设置为 2048 并配置 Java namenode 选项

$ ./elastic-mapreduce –create –alive \
  --bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-daemons \
  --args --namenode-heap-size=2048,--namenode-opts=-XX:GCTimeRatio=19   

Predefined JVM Settings预定义 JVM 设置

Alternatively, as per the FAQ How do I configure Hadoop settings for my job flow?或者,根据常见问题解答如何为我的工作流程配置 Hadoop 设置? , if your job flow tasks are memory-intensive, you may choose to use fewer tasks per core and reduce your job tracker heap size. 如果您的作业流任务是内存密集型任务,您可以选择每个内核使用较少的任务并减少作业跟踪器堆大小。 For this situation, a pre-defined Bootstrap Action is available to configure your job flow on startup - this refers to action Configure Memory-Intensive Workloads , which allows you to set cluster-wide Hadoop settings to values appropriate for job flows with memory-intensive workloads , for example:对于这种情况,预定义的 Bootstrap Action 可用于在启动时配置您的作业流- 这指的是操作Configure Memory-Intensive Workloads ,它允许您将集群范围的 Hadoop 设置设置为适合内存密集型作业流的值工作负载,例如:

$ ./elastic-mapreduce --create \
--bootstrap-action \
  s3://elasticmapreduce/bootstrap-actions/configurations/latest/memory-intensive

The specific configuration settings applied by this predefined bootstrap action are listed in Hadoop Memory-Intensive Configuration Settings . Hadoop Memory-Intensive Configuration Settings中列出了此预定义引导操作应用的特定配置设置。

Good luck!祝你好运!

Steffen's answer is good and works. Steffen 的回答很好并且有效。 On the other hand if you just want something quick-and-dirty and just want to replace one or two variables, then you're probably looking to just change it via the command line like the following:另一方面,如果您只是想要一些快速而肮脏的东西并且只想替换一个或两个变量,那么您可能希望通过命令行更改它,如下所示:

elastic-mapreduce --create \
--bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop \
  --args "-m,mapred.child.java.opts=-Xmx999m"

I've seen another documentation, albeit an older one, that simply quotes the entire expression within one quote like the following:我看过另一个文档,尽管是一个较旧的文档,它只是在一个引号中引用整个表达式,如下所示:

--bootstrap-action "s3://elasticmapreduce/bootstrap-actions/configure-hadoop -m \
    mapred.child.java.opts=-Xmx999m"    ### I tried this style, it no longer works!

At any rate, this is not easily found in the AWS EMR documentation .无论如何,这在AWS EMR 文档中并不容易找到。 I suspect that mapred.child.java.opts is one of the most overridden variables-- I was also looking for an answer when I got a GC error: "java.lang.OutOfMemoryError: GC overhead limit exceeded" and stumbled on this page.我怀疑 mapred.child.java.opts 是最常被覆盖的变量之一——当我收到 GC 错误时,我也在寻找答案:“java.lang.OutOfMemoryError: GC overhead limit exceeded”,无意中发现了这个页面. The default of 200m is just too small ( documentation on defaults ). 200m 的默认值太小了( 有关默认值的文档)。

Good luck!祝你好运!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM