简体   繁体   English

合并 output 文件时出现 java.lang.NullPointerException

[英]java.lang.NullPointerException when merging output files

I have a table with 3 partition columns我有一个包含 3 个分区列的表

create table tn(
col1 string,
etc...
)
partitioned by (
time_key date,
region string,
city string
)
stored as orc
tblproperties ("orc.compress"="ZLIB");

City partition can have from a few mb to a few hundred mb.城市分区可以有从几mb到几百mb。 I'm trying to optimize storage so all small files are merged into a single one equal to a block size of 128mb, and bigger files are split accordingly.我正在尝试优化存储,以便将所有小文件合并为一个等于 128mb 块大小的文件,并相应地拆分更大的文件。

Source table has 200 files around 150 mb each.源表有 200 个文件,每个文件大约 150 mb。 It's not partitioned.它没有分区。

I do a simple insert statement for that.我为此做了一个简单的插入语句。

INSERT INTO TABLE tn PARTITION (time_key, region, city) 
SELECT * FROM source_tn;

With the following settings and get this error NullPointerException .使用以下设置并得到此错误NullPointerException

set hive.merge.tezfiles=true;
set hive.merge.smallfiles.avgsize=128000000;
set hive.merge.size.per.task=128000000;
set hive.merge.orcfile.stripe.level=true;
set hive.auto.convert.join=false;

If I try the insert without these settings it works out fine, so there isn't anything wrong with the data.如果我在没有这些设置的情况下尝试插入,则效果很好,因此数据没有任何问题。 The problem in this case is each city subpartition holds around 200 files.这种情况下的问题是每个city子分区包含大约 200 个文件。 With the total amount of files in the time_key partiton reaching 30-40 thousands. time_key分区中的文件总数达到 30-40 千。

What's the problem and what can I do?有什么问题,我该怎么办?

I'm using hive tez.我正在使用 hive tez。

Setting this to false helped.将此设置为 false 有帮助。

set hive.merge.orcfile.stripe.level=false;

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 蜂巢jdbc引发“错误运行查询:java.lang.NullPointerException” - hive jdbc throws “Error running query: java.lang.NullPointerException” HIVE XML SerDe:失败,发生异常java.io.IOException:java.lang.NullPointerException - HIVE XML SerDe: Failed with exception java.io.IOException:java.lang.NullPointerException 蜂巢插入-发生异常,无法更改表。 java.lang.NullPointerException - Hive Insert - Failed with exception Unable to alter table. java.lang.NullPointerException Hive 顶点失败:由于:ROOT_INPUT_INIT_FAILURE 导致的终止/失败:java.lang.NullPointerException - Hive Vertex failed: killed/failed due to:ROOT_INPUT_INIT_FAILURE Caused by: java.lang.NullPointerException 合并hadoop中的小文件 - Merging small files in hadoop java.lang.NoSuchFieldError:创建Hive表时的类型 - java.lang.NoSuchFieldError: type When creating a Hive table 在线程“main”java.lang.NoSuchFieldError:type中运行Hive-0.9.0异常时出错 - Error when run Hive-0.9.0 Exception in thread “main” java.lang.NoSuchFieldError: type 查询受Avro支持的配置单元表时出错:java.lang.IllegalArgumentException - Error when querying avro-backed hive table: java.lang.IllegalArgumentException 在yarn上使用spark sql hivesql时获取java.lang.LinkageError:ClassCastException - Get a java.lang.LinkageError: ClassCastException when use spark sql hivesql on yarn java.lang.NoClassDefFoundError: org/apache/hadoop/hive/ql/metadata/HiveException 在 spark-shell 中查询时 - java.lang.NoClassDefFoundError: org/apache/hadoop/hive/ql/metadata/HiveException when query in spark-shell
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM