简体   繁体   English

如何在Hive中使用Serde上传Twitter JSON数据?

[英]How to upload twitter json data using serde in hive?

I am using twitter data to load in hive and the do some query on it: My tweeter data(raw) is:(One format only)- 我正在使用Twitter数据将其加载到蜂巢中,并对其进行一些查询:我的高音扬声器数据(原始)是:(仅一种格式)-

{"created_at":"Tue Apr 28 23:28:15 +0000 2015","id":593195048306610176,"id_str":"593195048306610176","text":"Apple watch now has Tinder integration, now you can swipe on the go. This is revolutionary.","source":"\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":56632588,"id_str":"56632588","name":"Farmer Mike South","screen_name":"HunterPachell","location":"Bowling Green","url":"http:\/\/pornhub.com","description":"\u0394T\u0394 Bowling Green State University '16 BGSU Lax #2 See my latest highlights on http:\/\/pornhub.com","protected":false,"verified":false,"followers_count":439,"friends_count":997,"listed_count":1,"favourites_count":4548,"statuses_count":3702,"created_at":"Tue Jul 14 07:05:51 +0000 2009","utc_offset":-25200,"time_zone":"Pacific Time (US & Canada)","geo_enabled":true,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"050005","profile_background_image_url":"http:\/\/pbs.twimg.com\/profile_background_images\/344918034410158087\/38851478822519fa3c9f5d50284b00d4.jpeg","profile_background_image_url_https":"https:\/\/pbs.twimg.com\/profile_background_images\/344918034410158087\/38851478822519fa3c9f5d50284b00d4.jpeg","profile_background_tile":false,"profile_link_color":"000000","profile_sidebar_border_color":"FFFFFF","profile_sidebar_fill_color":"95E8EC","profile_text_color":"3C3940","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/593108136317300738\/tf4W1APu_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/593108136317300738\/tf4W1APu_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/56632588\/1420260655","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"trends":[],"urls":[],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1430263695204"}

I am using external table hive schema: 我正在使用外部表配置单元架构:

CREATE  External TABLE tweets (
  id BIGINT,
  created_at STRING,
  source STRING,
  favorited BOOLEAN,
retweet_count INT,
  retweeted_status STRUCT<
    text:STRING,
    user:STRUCT<screen_name:STRING,name:STRING>,
    retweet_count:INT>,
  entities STRUCT<
    urls:ARRAY<STRUCT<expanded_url:STRING>>,
    user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
    hashtags:ARRAY<STRUCT<text:STRING>>>,
  text STRING,
  user STRUCT<
    screen_name:STRING,
    name:STRING,
    friends_count:INT,
    followers_count:INT,
    statuses_count:INT,
    verified:BOOLEAN,
    utc_offset:INT,
    time_zone:STRING>,
  in_reply_to_screen_name STRING
) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
LOCATION '/user/hastimal/tweets'; 

But when I do something simple like 但是当我做一些简单的事情
select * from tweets limit 1; 从推文限制1中选择*;
it shows error: 它显示错误:

  hive> select * from tweets limit 1;

OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.json.JSONObject cannot be cast to [Ljava.lang.Object; 

Also I tried all available in google/stackoverflow and I found something this Loading Linkedin JSON response into HIVE 我也尝试了google / stackoverflow中所有可用的方法,并且发现了一些将Linkedin JSON响应加载到HIVE中的内容
but not working. 但不起作用。 Please help..... 请帮忙.....

I think that problem with the jar. 我认为罐子有问题。 download the hive-serde jar from below link 从下面的链接下载蜂巢Serde Jar

files.cloudera.com/samples/hive-serdes-1.0-SNAPSHOT.jar

add the jar file 添加jar文件

hive>  add jar  hive-serdes-1.0-SNAPSHOT.jar 

----create table --- ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe' like this ----创建表--- ROW FORMAT SERDE'com.cloudera.hive.serde.JSONSerDe'像这样

 CREATE  External TABLE tweetsjson (
 id BIGINT,
 created_at STRING,
source STRING,
favorited BOOLEAN,
retweet_count INT,
retweeted_status STRUCT<
text:STRING,
user:STRUCT<screen_name:STRING,name:STRING>,
retweet_count:INT>,
entities STRUCT<
urls:ARRAY<STRUCT<expanded_url:STRING>>,
user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
hashtags:ARRAY<STRUCT<text:STRING>>>,
text STRING,
user STRUCT<
screen_name:STRING,
name:STRING,
friends_count:INT,
followers_count:INT,
statuses_count:INT,
verified:BOOLEAN,
utc_offset:INT,
time_zone:STRING>,
in_reply_to_screen_name STRING
) 
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
LOCATION '/user/hastimal/tweets';

let me know if not works 让我知道是否有效

you need to pass the jar when you create the external table 创建外部表时需要传递jar

ADD Jar (directory) 添加Jar(目录)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何为Hive创建一个使用SerDe解析深度嵌套的json(Azure Application Insights输出)的架构? - How do I create a schema for Hive to parse deeply nested json (Azure Application Insights output) using SerDe? 在Java程序中从hive2 json-serde表中获取数据时发生异常 - Exception while fetching the data from hive2 json-serde table in java program 如何解决异常:将数据保存在Hive serde表中,请使用insertInto()API作为替代。 星火:2.1.0 - How to fix the Exception: Saving data in the Hive serde table Please use the insertInto() API as an alternative. Spark:2.1.0 尝试使用自定义SerDe创建Hive表时出错 - Error when trying to create a Hive table using a custom SerDe 如何使用 Java 上传 Elasticsearch 中的 Json 数据或文件? - How to upload Json Data or file in Elasticsearch using Java? Hive JSON SerDe — ClassCastException:无法将java.lang.Integer强制转换为java.lang.Double - Hive JSON SerDe — ClassCastException: java.lang.Integer cannot be cast to java.lang.Double 如何使用Java程序将数据加载到配置单元中? - How to load data into hive by using Java Program? 如何在 Postman 中上传文件和 JSON 数据? - How to upload a file and JSON data in Postman? 创建表“无法验证serde:com.cloudera.hive.serde.JSONSerDe”时,Hive会抛出错误 - Hive throws an error while creating table “Cannot validate serde: com.cloudera.hive.serde.JSONSerDe” 如何使用RoboSpice Google Http Java Client模块和JSON数据上传文件 - How to upload file using RoboSpice Google Http Java Client module with JSON Data
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM