简体   繁体   English

在哪里可以找到 AWS EMR 集群中的节点日志?

[英]Where to find node logs in AWS EMR cluster?

I have pyspark program running on AWS EMR cluster.我在AWS EMR 集群上运行了 pyspark 程序。 Cluster config is like this - emr-5.31.0, hadoop 2.10.0, hive 2.3.7, hue 4.7.1, pig 0.17.0.集群配置是这样的 - emr-5.31.0、hadoop 2.10.0、hive 2.3.7、hue 4.7.1、pig 0.17.0。

Program processes some files on hdfs file system but at some moment it is getting errors.程序处理 hdfs 文件系统上的一些文件,但有时会出错。

In amazon console - YARN applications - application_XXX (Spark) - executors - driver - stderr: 'could not obtain block ... file=在亚马逊控制台 - YARN 应用程序 - application_XXX (Spark) - executors - 驱动程序 - stderr: '无法获得块...文件=

A little before this message there is 'Task 0 in stage 35 failed 4 times.在此消息之前,有“阶段 35 中的任务 0 失败了 4 次”。 aborting job'中止工作'

If i go to amazon console - YARN applications - application_XXX (Spark) - stages - 35 - tasks - 0 - stdout - i dont see anything bad at first glance except a lot of 'GC (allocation Failure)' messages.如果我去亚马逊控制台 - YARN 应用程序 - application_XXX (Spark) - 阶段 - 35 - 任务 - 0 - 标准输出 - 除了很多“GC(分配失败)”消息外,乍一看我没有看到任何不好的东西。

In its stderr - there is a WARN - 'Could not obtain block XXX, file= No live nodes contain current block Block locations: Dead nodes: .在其标准错误中 - 有一个警告 - '无法获得区块 XXX,文件 = 没有活动节点包含当前区块区块位置:死节点:。 Throwing a BlockMissingException.抛出一个 BlockMissingException。

If i go to monitoring tab - node status - i see that one node became unhealthy at that time and thats it.如果我转到监控选项卡 - 节点状态 - 我看到一个节点当时变得不健康,仅此而已。 Number of nodes also changed at 'live data nodes', 'MR total nodes', 'MR active nodes', MR lost nodes' charts. “实时数据节点”、“MR 总节点”、“MR 活动节点”、“MR 丢失节点”图表中的节点数量也发生了变化。

As i understand, task cannot find file on hdfs because node it was hosted on became unhealthy.据我了解,任务无法在 hdfs 上找到文件,因为它所在的节点变得不健康。

My question is where i can find the reasons node became unhealthy.我的问题是我在哪里可以找到节点变得不健康的原因。 I wasnt able to find any other logs on amazon console.我无法在亚马逊控制台上找到任何其他日志。 May be there are some node-local places where this reason is stored?可能有一些节点本地的地方存储了这个原因?

Hi I launched a EMR myself some time ago, dont remember about the logs.嗨,我前段时间自己启动了 EMR,不记得日志了。 But consulting the docs here:但是在这里查阅文档:

https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-manage-view-web-log-files.html https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-manage-view-web-log-files.html

It states that they are stored on the machines (which I assume you have the keys), they are also stored on S3 by default.它指出它们存储在机器上(我假设您拥有密钥),默认情况下它们也存储在 S3 上。 Not sure in which bucket they will be created.不确定它们将在哪个存储桶中创建。

Best Regards :)此致 :)

On the Summary page for your EMR cluster there is a section named "Configuration details".在您的 EMR 集群的“摘要”页面上,有一个名为“配置详细信息”的部分。

Below that, there is a label named "Log URI".在其下方,有一个名为“Log URI”的标签。 It points to an S3 URI, but, there is also a small folder icon.它指向一个 S3 URI,但是,还有一个小文件夹图标。

Click on that icon and you can browse to the logs on the nodes for your EMR cluster.单击该图标,您可以浏览到 EMR 集群节点上的日志。

Actually, for amazon there are more logs accessible via s3 location - there are logs for node boot and configuration part, and logs from running services on node - hdfs and yarn, which i was looking for.实际上,对于亚马逊,可以通过 s3 位置访问更多日志 - 有节点启动和配置部分的日志,以及节点上运行服务的日志 - 我正在寻找的 hdfs 和 yarn。 Path looks like this - s3 location/cluster id/node/node id/applications - here i was able to find hdfs and yarn logs.路径看起来像这样 - s3 位置/集群 ID/节点/节点 ID/应用程序 - 在这里我能够找到 hdfs 和 yarn 日志。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM