简体   繁体   English

配置单元:无法从配置单元表中的文件插入数组和映射

[英]Hive: Cannot insert arrays and maps from file in hive table

Here is the schema of the table i have 这是我的表的架构

CREATE DATABASE IF NOT EXISTS mydb;
USE mydb;

CREATE TABLE IF NOT EXISTS mytab (

idcol   string,
arrcol  array<string>,
mapcol  map<string,string>
)
PARTITIONED BY (data_date string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE;

now all i want to do is insert a single row in this table. 现在我要做的就是在此表中插入一行。 I have that row in a psv file as 我在psv文件中有该行

123|["a","b"]|{"1":"a","2":"b"}

Here is how i try to load the data 这是我尝试加载数据的方式

USE mydb; LOAD DATA INPATH '/path/to/file' INTO TABLE mytab PARTITION (data_date='2019-02-02');

the query succeeds but when i see the results 查询成功,但是当我看到结果时

hive -e "use mydb; select * from mytab where data_date='2019-02-02';"

i get 我得到

hive> select * from mytab;
OK
123 ["[\"a\",\"b\"]"]   {"{\"1\":\"a\",\"2\":\"b\"}":null}  2019-02-02
Time taken: 2.39 seconds, Fetched: 1 row(s)

So looks like the LOAD did some transformation on the data. 因此,看起来LOAD对数据进行了一些转换。 It kept the string value fine, but had some issues with the array and the map. 它使字符串值保持良好,但数组和映射存在一些问题。

How can i properly insert arrays and maps ? 如何正确插入数组和映射?

I also tried the following as input 我也尝试了以下作为输入

123|array("a","b")|{"1":"a","2":"b"}

The load succeeded, but when i queried the data, i got 加载成功,但是当我查询数据时,我得到了

root@0d2b0044b4c1:/opt# hive -e "use mydb;select * from mytab;"
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/opt/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
OK
Time taken: 6.096 seconds
OK
123 ["array(\"a\",\"b\")"]  {"{\"1\":\"a\",\"2\":\"b\"}":null}  1554090675
Time taken: 3.266 seconds, Fetched: 1 row(s)

UPDATE 更新

thanks a lot @pedram bashiri for your answer. 非常感谢@pedram bashiri的回答。 I created the external table and was able to populate it. 我创建了外部表并能够填充它。 However, everything gets populated as string 但是,所有内容都填充为字符串

hive> drop table if exists extab;
OK
Time taken: 0.01 seconds
hive> create external table extab(idcol string,arrcol array<string>,mapcol map<string,string>, data_date string)
    >   row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
    > with serdeproperties (
    >   "separatorChar" = "|",
    >   "quoteChar"     = "\"",
    >   "escapeChar"    = "\\"
    >   )
    >   stored as textfile
    > location '/tmp/serdes/';
OK
Time taken: 0.078 seconds
hive> desc extab;
OK
idcol                   string                  from deserializer
arrcol                  string                  from deserializer
mapcol                  string                  from deserializer
data_date               string                  from deserializer
Time taken: 0.059 seconds, Fetched: 4 row(s)
hive> select * from extab;
OK
123 ["a","b"]   {"1":"a","2":"b"}   2019
Time taken: 0.153 seconds, Fetched: 1 row(s)
hive>

here is what is stored in hdfs 这是存储在hdfs中的内容

root@0d2b0044b4c1:/opt# hadoop fs -ls -R /tmp/serdes/
-rw-r--r--   1 root root         37 2019-04-04 22:06 /tmp/serdes/x.psv
root@0d2b0044b4c1:/opt# hadoop fs -cat /tmp/serdes/x.psv
123|["a","b"]|{"1":"a","2":"b"}|2019
root@0d2b0044b4c1:/opt#

I also tried 我也试过

create external table extab(idcol string,arrcol array<string>,mapcol map<string,string>, data_date string)
  row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
with serdeproperties (
  "separatorChar" = "|"
  )
  stored as textfile
location '/tmp/serdes/';

but still, everything gets stored as string so now i when i try to insert i get type mismatch. 但仍然,所有内容都存储为字符串,所以现在当我尝试插入时,类型不匹配。

Use opencsv to create an external table based on your psv file and call it mytab_exterrnal. 使用opencsv根据您的psv文件创建一个外部表,并将其命名为mytab_exterrnal。 Specify serdeproperties like 指定serdeproperties,例如

with serdeproperties (
"separatorChar" = "|",
"quoteChar"     = """,
"escapeChar"    = "\\"
)

And then simply do 然后简单地做

INSERT INTO mytab
SELECT * FROM mytab_external;

https://community.hortonworks.com/articles/8313/apache-hive-csv-serde-example.html https://community.hortonworks.com/articles/8313/apache-hive-csv-serde-example.html

So after a lot of digging, i figured it out 因此,经过大量的挖掘,我发现了

CREATE TABLE IF NOT EXISTS mytab (

idcol   string,
arrcol  array<string>,
mapcol  map<string,string>
)
PARTITIONED BY (data_date string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
COLLECTION ITEMS TERMINATED BY ','
MAP KEYS TERMINATED BY '='
STORED AS TEXTFILE;

then, i can just load the following 然后,我可以加载以下内容

123|a,b|1=a,2=b|2019

root@0d2b0044b4c1:/opt# hive -e "use mydb; LOAD DATA INPATH '/path/to/file' INTO TABLE mytab PARTITION (data_date='2019');"
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/opt/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
OK
Time taken: 6.456 seconds
Loading data to table mydb.mytab partition (data_date=2019)
OK
Time taken: 1.912 seconds
root@0d2b0044b4c1:/opt# hive -e "use mydb; select * from mytab;"
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/opt/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
OK
Time taken: 6.843 seconds
OK
123 ["a","b"]   {"1":"a","2":"b"}   2019
root@0d2b0044b4c1:/opt#

which is exactly what i needed 这正是我所需要的

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM