Hive count(1) leads to oom

Question

I have a new cluster built by cdh 6.3, hive is ready now and 3 nodes have 30GB memory.

I create a target hive table stored as parquet. I put some parquet files downloaded from another cluster to the HDFS directory of this hive table, and when I run

select count(1) from tableA

I finally shows:

INFO  : 2021-09-05 14:09:06,505 Stage-1 map = 62%,  reduce = 0%, Cumulative CPU 436.69 sec
INFO  : 2021-09-05 14:09:07,520 Stage-1 map = 74%,  reduce = 0%, Cumulative CPU 426.94 sec
INFO  : 2021-09-05 14:09:10,562 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 464.3 sec
INFO  : 2021-09-05 14:09:26,785 Stage-1 map = 94%,  reduce = 31%, Cumulative CPU 464.73 sec
INFO  : 2021-09-05 14:09:50,112 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 464.3 sec
INFO  : MapReduce Total cumulative CPU time: 7 minutes 44 seconds 300 msec
ERROR : Ended Job = job_1630821050931_0003 with errors
ERROR : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
INFO  : MapReduce Jobs Launched: 
INFO  : Stage-Stage-1: Map: 18  Reduce: 1   Cumulative CPU: 464.3 sec   HDFS Read: 4352500295 HDFS Write: 0 HDFS EC Read: 0 FAIL
INFO  : Total MapReduce CPU Time Spent: 7 minutes 44 seconds 300 msec
INFO  : Completed executing command(queryId=hive_20210905140833_6a46fec2-91fb-4214-a734-5b76e59a4266); Time taken: 77.981 seconds

Looking into MR logs, it repeatedly shows:

Caused by: java.lang.OutOfMemoryError: Java heap space
    at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
    at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
    at org.apache.parquet.bytes.HeapByteBufferAllocator.allocate(HeapByteBufferAllocator.java:32)
    at org.apache.parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:1080)
    at org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:712)
    at org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:126)
    at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:194)
    at org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:213)
    at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:101)
    at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:63)
    at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:75)
    at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:68)
    ... 16 more

The parquet files are only 4.5 GB in total, why could count() runs oom? What parameter should I change in MapReduce?

Answer 1

There are two ways how you can fix OOM in mapper: 1 - increase mapper parallelism, 2 - increase the mapper size.

Try to increase parallelism first.

Check current values of these parameters and reduce mapreduce.input.fileinputformat.split.maxsize to get more smaller mappers:

set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
set mapreduce.input.fileinputformat.split.minsize=16000; -- 16 KB files. smaller than min size will be processed on the same mapper combined 
set mapreduce.input.fileinputformat.split.maxsize=128000000; -- 128Mb -files bigger than max size will be splitted. Decrease your setting to get 2x more smaller mappers
--These figures are example only. Compare with yours and decrease accordingly untill you get 2x more mappers

Alternatively try to increase the mapper size:

set mapreduce.map.memory.mb=4096; --compare with current setting and increase
set mapreduce.map.java.opts=-Xmx3000m; --set ~30% less than mapreduce.map.memory.mb

Also try to disable map-side aggregation (map-side aggregation often leads to OOM on mapper)

set hive.map.aggr=false;

Hive count(1) leads to oom

Question

1 answers

solution1
0 ACCPTED 2021-09-05 12:40:13

Hive count(1) leads to oom

Question

1 answers

solution1 0 ACCPTED 2021-09-05 12:40:13

solution1
0 ACCPTED 2021-09-05 12:40:13