简体   繁体   English

访问 S3/MinIO 上的数据时,hive-on-tez 映射器卡在初始化中,容器总数为 -1

[英]hive-on-tez mapper stuck in INITIALIZING with total number of containers being -1 when accessing data on S3/MinIO

I have a Hadoop+Hive+Tez setup from scratch (meaning I deployed it component by component).我从头开始设置了 Hadoop+Hive+Tez(这意味着我逐个组件地部署了它)。 Hive is set up using Tez as execution engine. Hive 是使用 Tez 作为执行引擎设置的。

In its current status, Hive can access table on HDFS, but it can not access table stored on MinIO (using s3a filesystem implementation).在当前状态下,Hive 可以访问 HDFS 上的表,但无法访问存储在 MinIO 上的表(使用s3a文件系统实现)。

As shows the following screenshot,如以下截图所示, 在此处输入图片说明 when executing SELECT COUNT(*) FROM s3_table ,执行SELECT COUNT(*) FROM s3_table

  • Tez execution stuck forever Tez 执行永远卡住
  • Map 1 always in INITIALIZING state Map 1始终处于INITIALIZING状态
  • Map 1 always has a total count of -1 and pending count of -1 . Map 1的总计数始终为-1 ,待处理计数始终为-1 (why -1 ?) (为什么-1 ?)

Things already checked:已经检查的事情:

  • Hadoop can access MinIO/S3 without problem. Hadoop 可以毫无问题地访问 MinIO/S3。 For example, hdfs dfs -ls s3a://bucketname works well.例如, hdfs dfs -ls s3a://bucketname效果很好。
  • Hive-on-Tez can compute against tables on HDFS, with mappers and reducers generated successfully and quickly. Hive-on-Tez 可以针对 HDFS 上的表进行计算,并成功且快速地生成映射器和化简器。
  • Hive-on-MR can compute against tables on MinIO/S3 without problem. Hive-on-MR 可以毫无问题地针对 MinIO/S3 上的表进行计算。

What could be the possible causes for this problem?导致此问题的可能原因是什么?

Attaching Tez UI screenshot:附上 Tez UI 截图: 在此处输入图片说明

Version informations:版本信息:

  • Hadoop 3.2.1 Hadoop 3.2.1
  • Hive 3.1.2蜂巢 3.1.2
  • Tez 0.9.2 Tez 0.9.2
  • MinIO RELEASE.2020-01-25T02-50-51Z MinIO RELEASE.2020-01-25T02-50-51Z

It turned out the problem is that Tez's S3 support must be enabled explicitly at compile time.事实证明,问题在于必须在编译时显式启用 Tez 的 S3 支持。 For hadoop 2.8+, to enable S3 support, Tez must be compiled from source, with the following command:对于 hadoop 2.8+,要启用 S3 支持,必须使用以下命令从源代码编译 Tez:

mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true -Paws -Phadoop28 -P\!hadoop27

After that, drop the generated tez-xyztar.gz to HDFS and extract tez-xxx-minimal.tar.gz to $TEZ_LIB_DIR .之后,将生成的tez-xyztar.gz放到 HDFS 并将tez-xxx-minimal.tar.gz提取到$TEZ_LIB_DIR Then it worked for me.然后它对我有用。 Hive execution against MinIO/S3 runs smoothly.针对 MinIO/S3 的 Hive 执行运行流畅。

However, Tez installation guide didn't mention anything about enabling S3 support.但是, Tez 安装指南没有提到有关启用 S3 支持的任何内容。 Nor does the default Tez binary releases build with S3 or Azure support.默认的 Tez 二进制版本也不使用 S3 或 Azure 支持构建。

The (hopefully) complete build options and pitfalls are actually documented in BUILDING.txt , where it says: (希望)完整的构建选项和陷阱实际上记录在BUILDING.txt 中,它说:

However, to build against hadoop versions higher than 2.7.0, you will need to do the following:但是,要针对高于 2.7.0 的 hadoop 版本进行构建,您需要执行以下操作:

For Hadoop version X where X >= 2.8.0对于 Hadoop 版本 X,其中 X >= 2.8.0

 $ mvn package -Dhadoop.version=${X} -Phadoop28 -P\\!hadoop27

For recent versions of Hadoop (which do not bundle aws and azure by default), you can bundle AWS-S3 (2.7.0+) or Azure (2.7.0+) support:对于最新版本的 Hadoop(默认情况下不捆绑 aws 和 azure),您可以捆绑 AWS-S3 (2.7.0+) 或 Azure (2.7.0+) 支持:

 $ mvn package -Dhadoop.version=${X} -Paws -Pazure

My team faced a similar issue but while reading from HDFS instead, the map phase was forever stuck at Initializing.我的团队遇到了类似的问题,但在从 HDFS 读取时,映射阶段永远停留在初始化阶段。

This may help somebody else facing a similar problem.这可能会帮助其他面临类似问题的人。 Actually the application-master was getting Out-of-Memory.实际上,应用程序大师正在内存不足。 Increasing the below value to 12 GB worked for us.将以下值增加到 12 GB 对我们有用。

tez.am.resource.memory.mb tez.am.resource.memory.mb

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM