简体   繁体   中英

Hive SerDe returns error with JSON tweets Flume

I am collecting twitter stream data using Flume and storing it in JSON format in HDFS. I am trying to use Hive SerDe to put this twitter data into Hive table but I am getting a very frustrating error.

hive> ADD JAR file:////home/ubuntu/hive/lib/hive-serdes-1.0-SNAPSHOT.jar;
Added [file:////home/ubuntu/hive/lib/hive-serdes-1.0-SNAPSHOT.jar] to class path
Added resources: [file:////home/ubuntu/hive/lib/hive-serdes-1.0-SNAPSHOT.jar]
hive>  CREATE EXTERNAL TABLE tweet (
    >    id BIGINT,
    >    created_at STRING,
    >    source STRING,
    >    favorited BOOLEAN,
    >    text STRING,
    >    in_reply_to_screen_name STRING
    >  ) 
    > 
    >  ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
    >  LOCATION '/user/ubuntu/twitter/';
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org/apache/hadoop/hive/serde2/SerDe

Any help would be appreciated.

I had the same issue, however, I found a work around to solve the problem:

  1. create table tweets(tweet string);
  2. load data inpath 'home/hduser/test.json' into table tweets;

The only difference now you will need to use, get_json_object() to use the data.

Like below:

select get_json_object(tweet,'$.text') as tweet_text, get_json_object(tweet,'$.created_at') as created_at  from tweets;

Reference

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM