異常java.io.IOException失敗：org.apache.avro.AvroTypeException：發現很長，期待在hive中聯合

Question

需要幫忙！！！

我的Twitter流送入使用HDFS flume和加載它在hive進行分析。

步驟如下：

我在avsc文件中描述了avro schema並將其放在hadoop中：

 {"type":"record",
 "name":"Doc",
 "doc":"adoc",
 "fields":[{"name":"id","type":"string"},
       {"name":"user_friends_count","type":["int","null"]},
       {"name":"user_location","type":["string","null"]},
       {"name":"user_description","type":["string","null"]},
       {"name":"user_statuses_count","type":["int","null"]},
       {"name":"user_followers_count","type":["int","null"]},
       {"name":"user_name","type":["string","null"]},
       {"name":"user_screen_name","type":["string","null"]},
       {"name":"created_at","type":["string","null"]},
       {"name":"text","type":["string","null"]},
       {"name":"retweet_count","type":["boolean","null"]},
       {"name":"retweeted","type":["boolean","null"]},
       {"name":"in_reply_to_user_id","type":["long","null"]},
       {"name":"source","type":["string","null"]},
       {"name":"in_reply_to_status_id","type":["long","null"]},
       {"name":"media_url_https","type":["string","null"]},
       {"name":"expanded_url","type":["string","null"]}]}

我編寫了一個.hql文件來創建一個表並在其中加載數據：

 create table tweetsavro
    row format serde
        'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
    stored as inputformat
        'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
    outputformat
        'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
    tblproperties ('avro.schema.url'='hdfs:///avro_schema/AvroSchemaFile.avsc');

    load data inpath '/test/twitter_data/FlumeData.*' overwrite into table tweetsavro;

我已成功運行.hql文件，但是當我在hive中運行select *from <tablename>命令時，它顯示以下錯誤：

錯誤

tweetsavro的輸出是：

hive> desc tweetsavro;
OK
id                      string                                      
user_friends_count      int                                         
user_location           string                                      
user_description        string                                      
user_statuses_count     int                                         
user_followers_count    int                                         
user_name               string                                      
user_screen_name        string                                      
created_at              string                                      
text                    string                                      
retweet_count           boolean                                     
retweeted               boolean                                     
in_reply_to_user_id     bigint                                      
source                  string                                      
in_reply_to_status_id   bigint                                      
media_url_https         string                                      
expanded_url            string                                      
Time taken: 0.697 seconds, Fetched: 17 row(s)

Answer 1

我面臨着同樣的問題。 問題存在於timestamp字段（在您的情況下為“created_at”列），我試圖將其作為字符串插入到我的新表中。 我的假設是這個數據在我的源代碼中是[ "null","string"]格式。 我分析了從sqoop import --as-avrodatafile進程生成的源avro架構。 從導入生成的avro架構具有以下時間戳列的簽名。
{ "name" : "order_date", "type" : [ "null", "long" ], "default" : null, "columnName" : "order_date", "sqlType" : "93" },

SqlType 93代表Timestamp數據類型。 所以在我的目標表Avro Schema文件中，我將數據類型更改為“long”，這解決了這個問題。 我的猜測可能是你的一個列中數據類型的不匹配。

異常java.io.IOException失敗：org.apache.avro.AvroTypeException：發現很長，期待在hive中聯合

問題描述

1 個解決方案

解決方案1
6 已采納 2016-11-29 08:39:48

異常java.io.IOException失敗：org.apache.avro.AvroTypeException：發現很長，期待在hive中聯合

問題描述

1 個解決方案

解決方案1 6 已采納 2016-11-29 08:39:48

解決方案1
6 已采納 2016-11-29 08:39:48