[英]hive partition per one file
我不希望文件堆積太多,過去我曾遇到過錯誤,因為hdfs文件數超出了限制,我懷疑最大文件數中包含目錄數。 所以I want to partitioned table with one file not directory
我知道的分區目錄:
/test/test.db/test_log/create_date=2013-04-09/2013-04-09.csv.gz
/test/test.db/test_log/create_date=2013-04-10/2013-04-10.csv.gz
我試過像這樣添加分區。 有用。
ALTER TABLE test_log ADD PARTITION (create_date='2013-04-09') LOCATION '/test/tmp/test_log/2013-04-09.csv.gz'
我想要的分區的文件路徑:
/test/test.db/test_log/create_date=2013-04-09.csv.gz
/test/test.db/test_log/create_date=2013-04-10.csv.gz
我試過像這樣添加分區
ALTER TABLE test_log ADD PARTITION (create_date='2013-04-09') LOCATION '/test/tmp/test_log/2013-04-09.csv.gz'
引發錯誤
======================
HIVE FAILURE OUTPUT
======================
SET hive.support.sql11.reserved.keywords=false
SET hive.metastore.warehouse.dir=hdfs:/test/test.db
OK
OK
OK
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:hdfs://ABCDEFG/test/tmp/test_log/2013-04-09.csv.gz is not a directory or unable to create one)
======================
END HIVE FAILURE OUTPUT
======================
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/spark/python/pyspark/sql/context.py", line 580, in sql
return DataFrame(self._ssql_ctx.sql(sqlQuery), self)
File "/usr/local/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
File "/usr/local/spark/python/pyspark/sql/utils.py", line 45, in deco
return f(*a, **kw)
File "/usr/local/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o32.sql.
: org.apache.spark.sql.execution.QueryExecutionException: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:hdfs://ABCDEFG/test/tmp/test_log/2013-04-09.csv.gz is not a directory or unable to create one)
表架構是這樣的
CREATE TABLE IF NOT EXISTS test_log (
testid INT,
create_dt STRING
)
PARTITIONED BY (create_date STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
創建/更改配置單元表的位置時,應僅指定到文件夾為止
ALTER TABLE test_log ADD PARTITION (create_date='2013-04-09') LOCATION '/test/tmp/test_log/create_date=2013-04-09/'
將文件放在該位置
hadoop fs -put /test/test.db/test_log/create_date=2013-04-09/create_date=2013-04-09.csv.gz
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.