记录太大，无法容纳内存缓冲区。通过TEZ处理Hive的ORC表时出错

Question

We are trying to read data from 'ORC' table in HIVE (1.2.1) and put that data into table with 'TextInputFormat'. 我们正在尝试从HIVE（1.2.1）中的“ ORC”表中读取数据，并使用“ TextInputFormat”将该数据放入表中。 Some entries are too large in original data and following error occurs during operation: 某些条目的原始数据太大，并且在操作过程中会发生以下错误：

org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.tez.runtime.library.common.sort.impl.ExternalSorter$MapBufferTooSmallException: Record too large for in-memory buffer. org.apache.hadoop.hive.ql.metadata.HiveException：org.apache.tez.runtime.library.common.sort.impl.ExternalSorter $ MapBufferTooSmallException：记录太大，无法容纳内存缓冲区。 Exceeded buffer overflow limit, bufferOverflowRecursion=2, bufferList.size=1, blockSize=1610612736 超出缓冲区溢出限制，bufferOverflowRecursion = 2，bufferList.size = 1，blockSize = 1610612736

Any ideas how to fix the issue? 任何想法如何解决该问题？

We are using TEZ engine for queries execution and there are no errors with simple MR engine. 我们使用TEZ引擎执行查询，并且简单的MR引擎没有错误。

Query to execute: 查询执行：

insert overwrite table visits_text_test_1m select * from visits where dt='2016-01-19' limit 1000000;

Upd: Same error when copying from ORC to ORC storage. 更新：从ORC复制到ORC存储时出现相同的错误。

Upd 2: Simple 'select' from ORC works pretty good with any engine. 更新2：来自ORC的简单“选择”在任何引擎上都可以很好地工作。

Answer 1

Hint #1: just switch from TEZ to MapReduce before running your query - slower but more resilient. 提示1：在运行查询之前，只需从TEZ切换到MapReduce-速度较慢但更具弹性。

set hive.execution.engine = mr ;

Hint #2: since the exception comes out of the dreadful TEZ ExternalSorter beast, dig into TEZ properties such as tez.runtime.sorter.class , tez.runtime.io.sort.mb etc. Be warned that finding a working set of properties (not even speaking of tuning them to match your hive.tez.container.size ) will probably require some kind of voodoo sacrifice. 提示2：由于异常来自可怕的TEZ ExternalSorter野兽，因此请深入研究TEZ属性，例如tez.runtime.sorter.class ， tez.runtime.io.sort.mb等。请注意，请找到有效的属性集（甚至不说要调整它们以匹配您的hive.tez.container.size ）都可能需要某种伏都hive.tez.container.size牺牲。

Cf. 参看 HortonWork's Configuring Tez manual for starters. HortonWork的入门配置手册。

Answer 2

like samson said you might want to increase the container size and also I found sometimes the JOIN does lead to the issue because by default the hive converts the join to MAPJOIN. 就像samson所说，您可能想增加容器的大小，而且我发现有时候JOIN确实会导致问题，因为默认情况下，配置单元会将联接转换为MAPJOIN。 You may want to try below settings in the query and see if it helps: 您可能想尝试以下查询中的设置，看看是否有帮助：

set hive.auto.convert.join=false;
set hive.auto.convert.join.noconditionaltask=false;

记录太大，无法容纳内存缓冲区。通过TEZ处理Hive的ORC表时出错

问题描述

2 个解决方案

解决方案1
0 2016-02-10 17:33:10

解决方案2
0 2016-02-11 20:16:10

记录太大，无法容纳内存缓冲区。 通过TEZ处理Hive的ORC表时出错

问题描述

2 个解决方案

解决方案1 0 2016-02-10 17:33:10

解决方案2 0 2016-02-11 20:16:10

记录太大，无法容纳内存缓冲区。通过TEZ处理Hive的ORC表时出错

解决方案1
0 2016-02-10 17:33:10

解决方案2
0 2016-02-11 20:16:10