简体   繁体   中英

Record too large for in-memory buffer. Error when working with Hive's ORC tables via TEZ

We are trying to read data from 'ORC' table in HIVE (1.2.1) and put that data into table with 'TextInputFormat'. Some entries are too large in original data and following error occurs during operation:

org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.tez.runtime.library.common.sort.impl.ExternalSorter$MapBufferTooSmallException: Record too large for in-memory buffer. Exceeded buffer overflow limit, bufferOverflowRecursion=2, bufferList.size=1, blockSize=1610612736

Any ideas how to fix the issue?

We are using TEZ engine for queries execution and there are no errors with simple MR engine.

Query to execute:

insert overwrite table visits_text_test_1m select * from visits where dt='2016-01-19' limit 1000000;

Upd: Same error when copying from ORC to ORC storage.

Upd 2: Simple 'select' from ORC works pretty good with any engine.

Hint #1: just switch from TEZ to MapReduce before running your query - slower but more resilient.

set hive.execution.engine = mr ;

Hint #2: since the exception comes out of the dreadful TEZ ExternalSorter beast, dig into TEZ properties such as tez.runtime.sorter.class , tez.runtime.io.sort.mb etc. Be warned that finding a working set of properties (not even speaking of tuning them to match your hive.tez.container.size ) will probably require some kind of voodoo sacrifice.

Cf. HortonWork's Configuring Tez manual for starters.

like samson said you might want to increase the container size and also I found sometimes the JOIN does lead to the issue because by default the hive converts the join to MAPJOIN. You may want to try below settings in the query and see if it helps:

set hive.auto.convert.join=false;
set hive.auto.convert.join.noconditionaltask=false;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM