![](/img/trans.png)
[英]Hive query execution: Failed with exception java.io.IOException:org.apache.avro.AvroTypeException: Found double, expecting union
[英]Failed with exception java.io.IOException:org.apache.avro.AvroTypeException: Found long, expecting union in hive
需要幫忙!!!
我的Twitter流送入使用HDFS flume
和加載它在hive
進行分析。
步驟如下:
hdfs中的數據:
我在avsc
文件中描述了avro schema
並將其放在hadoop中:
{"type":"record",
"name":"Doc",
"doc":"adoc",
"fields":[{"name":"id","type":"string"},
{"name":"user_friends_count","type":["int","null"]},
{"name":"user_location","type":["string","null"]},
{"name":"user_description","type":["string","null"]},
{"name":"user_statuses_count","type":["int","null"]},
{"name":"user_followers_count","type":["int","null"]},
{"name":"user_name","type":["string","null"]},
{"name":"user_screen_name","type":["string","null"]},
{"name":"created_at","type":["string","null"]},
{"name":"text","type":["string","null"]},
{"name":"retweet_count","type":["boolean","null"]},
{"name":"retweeted","type":["boolean","null"]},
{"name":"in_reply_to_user_id","type":["long","null"]},
{"name":"source","type":["string","null"]},
{"name":"in_reply_to_status_id","type":["long","null"]},
{"name":"media_url_https","type":["string","null"]},
{"name":"expanded_url","type":["string","null"]}]}
我編寫了一個.hql文件來創建一個表並在其中加載數據:
create table tweetsavro
row format serde
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
stored as inputformat
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
outputformat
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
tblproperties ('avro.schema.url'='hdfs:///avro_schema/AvroSchemaFile.avsc');
load data inpath '/test/twitter_data/FlumeData.*' overwrite into table tweetsavro;
我已成功運行.hql文件,但是當我在hive中運行select *from <tablename>
命令時,它顯示以下錯誤:
tweetsavro的輸出是:
hive> desc tweetsavro;
OK
id string
user_friends_count int
user_location string
user_description string
user_statuses_count int
user_followers_count int
user_name string
user_screen_name string
created_at string
text string
retweet_count boolean
retweeted boolean
in_reply_to_user_id bigint
source string
in_reply_to_status_id bigint
media_url_https string
expanded_url string
Time taken: 0.697 seconds, Fetched: 17 row(s)
我面臨着同樣的問題。 問題存在於timestamp字段(在您的情況下為“created_at”列),我試圖將其作為字符串插入到我的新表中。 我的假設是這個數據在我的源代碼中是[ "null","string"]
格式。 我分析了從sqoop import --as-avrodatafile進程生成的源avro架構。 從導入生成的avro架構具有以下時間戳列的簽名。
{ "name" : "order_date", "type" : [ "null", "long" ], "default" : null, "columnName" : "order_date", "sqlType" : "93" },
SqlType 93代表Timestamp數據類型。 所以在我的目標表Avro Schema文件中,我將數據類型更改為“long”,這解決了這個問題。 我的猜測可能是你的一個列中數據類型的不匹配。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.