[英]Hive: Cannot insert arrays and maps from file in hive table
Here is the schema of the table i have 这是我的表的架构
CREATE DATABASE IF NOT EXISTS mydb;
USE mydb;
CREATE TABLE IF NOT EXISTS mytab (
idcol string,
arrcol array<string>,
mapcol map<string,string>
)
PARTITIONED BY (data_date string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;
now all i want to do is insert a single row in this table. 现在我要做的就是在此表中插入一行。 I have that row in a psv file as
我在psv文件中有该行
123|["a","b"]|{"1":"a","2":"b"}
Here is how i try to load the data 这是我尝试加载数据的方式
USE mydb; LOAD DATA INPATH '/path/to/file' INTO TABLE mytab PARTITION (data_date='2019-02-02');
the query succeeds but when i see the results 查询成功,但是当我看到结果时
hive -e "use mydb; select * from mytab where data_date='2019-02-02';"
i get 我得到
hive> select * from mytab;
OK
123 ["[\"a\",\"b\"]"] {"{\"1\":\"a\",\"2\":\"b\"}":null} 2019-02-02
Time taken: 2.39 seconds, Fetched: 1 row(s)
So looks like the LOAD
did some transformation on the data. 因此,看起来
LOAD
对数据进行了一些转换。 It kept the string value fine, but had some issues with the array and the map. 它使字符串值保持良好,但数组和映射存在一些问题。
How can i properly insert arrays and maps ? 如何正确插入数组和映射?
I also tried the following as input 我也尝试了以下作为输入
123|array("a","b")|{"1":"a","2":"b"}
The load succeeded, but when i queried the data, i got 加载成功,但是当我查询数据时,我得到了
root@0d2b0044b4c1:/opt# hive -e "use mydb;select * from mytab;"
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/opt/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
OK
Time taken: 6.096 seconds
OK
123 ["array(\"a\",\"b\")"] {"{\"1\":\"a\",\"2\":\"b\"}":null} 1554090675
Time taken: 3.266 seconds, Fetched: 1 row(s)
UPDATE 更新
thanks a lot @pedram bashiri for your answer. 非常感谢@pedram bashiri的回答。 I created the external table and was able to populate it.
我创建了外部表并能够填充它。 However, everything gets populated as string
但是,所有内容都填充为字符串
hive> drop table if exists extab;
OK
Time taken: 0.01 seconds
hive> create external table extab(idcol string,arrcol array<string>,mapcol map<string,string>, data_date string)
> row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
> with serdeproperties (
> "separatorChar" = "|",
> "quoteChar" = "\"",
> "escapeChar" = "\\"
> )
> stored as textfile
> location '/tmp/serdes/';
OK
Time taken: 0.078 seconds
hive> desc extab;
OK
idcol string from deserializer
arrcol string from deserializer
mapcol string from deserializer
data_date string from deserializer
Time taken: 0.059 seconds, Fetched: 4 row(s)
hive> select * from extab;
OK
123 ["a","b"] {"1":"a","2":"b"} 2019
Time taken: 0.153 seconds, Fetched: 1 row(s)
hive>
here is what is stored in hdfs 这是存储在hdfs中的内容
root@0d2b0044b4c1:/opt# hadoop fs -ls -R /tmp/serdes/
-rw-r--r-- 1 root root 37 2019-04-04 22:06 /tmp/serdes/x.psv
root@0d2b0044b4c1:/opt# hadoop fs -cat /tmp/serdes/x.psv
123|["a","b"]|{"1":"a","2":"b"}|2019
root@0d2b0044b4c1:/opt#
I also tried 我也试过
create external table extab(idcol string,arrcol array<string>,mapcol map<string,string>, data_date string)
row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
with serdeproperties (
"separatorChar" = "|"
)
stored as textfile
location '/tmp/serdes/';
but still, everything gets stored as string so now i when i try to insert i get type mismatch. 但仍然,所有内容都存储为字符串,所以现在当我尝试插入时,类型不匹配。
Use opencsv to create an external table based on your psv file and call it mytab_exterrnal. 使用opencsv根据您的psv文件创建一个外部表,并将其命名为mytab_exterrnal。 Specify serdeproperties like
指定serdeproperties,例如
with serdeproperties (
"separatorChar" = "|",
"quoteChar" = """,
"escapeChar" = "\\"
)
And then simply do 然后简单地做
INSERT INTO mytab
SELECT * FROM mytab_external;
https://community.hortonworks.com/articles/8313/apache-hive-csv-serde-example.html https://community.hortonworks.com/articles/8313/apache-hive-csv-serde-example.html
So after a lot of digging, i figured it out 因此,经过大量的挖掘,我发现了
CREATE TABLE IF NOT EXISTS mytab (
idcol string,
arrcol array<string>,
mapcol map<string,string>
)
PARTITIONED BY (data_date string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
COLLECTION ITEMS TERMINATED BY ','
MAP KEYS TERMINATED BY '='
STORED AS TEXTFILE;
then, i can just load the following 然后,我可以加载以下内容
123|a,b|1=a,2=b|2019
root@0d2b0044b4c1:/opt# hive -e "use mydb; LOAD DATA INPATH '/path/to/file' INTO TABLE mytab PARTITION (data_date='2019');"
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/opt/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
OK
Time taken: 6.456 seconds
Loading data to table mydb.mytab partition (data_date=2019)
OK
Time taken: 1.912 seconds
root@0d2b0044b4c1:/opt# hive -e "use mydb; select * from mytab;"
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/opt/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
OK
Time taken: 6.843 seconds
OK
123 ["a","b"] {"1":"a","2":"b"} 2019
root@0d2b0044b4c1:/opt#
which is exactly what i needed 这正是我所需要的
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.