[英]How to upload twitter json data using serde in hive?
I am using twitter data to load in hive and the do some query on it: My tweeter data(raw) is:(One format only)- 我正在使用Twitter数据将其加载到蜂巢中,并对其进行一些查询:我的高音扬声器数据(原始)是:(仅一种格式)-
{"created_at":"Tue Apr 28 23:28:15 +0000 2015","id":593195048306610176,"id_str":"593195048306610176","text":"Apple watch now has Tinder integration, now you can swipe on the go. This is revolutionary.","source":"\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":56632588,"id_str":"56632588","name":"Farmer Mike South","screen_name":"HunterPachell","location":"Bowling Green","url":"http:\/\/pornhub.com","description":"\u0394T\u0394 Bowling Green State University '16 BGSU Lax #2 See my latest highlights on http:\/\/pornhub.com","protected":false,"verified":false,"followers_count":439,"friends_count":997,"listed_count":1,"favourites_count":4548,"statuses_count":3702,"created_at":"Tue Jul 14 07:05:51 +0000 2009","utc_offset":-25200,"time_zone":"Pacific Time (US & Canada)","geo_enabled":true,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"050005","profile_background_image_url":"http:\/\/pbs.twimg.com\/profile_background_images\/344918034410158087\/38851478822519fa3c9f5d50284b00d4.jpeg","profile_background_image_url_https":"https:\/\/pbs.twimg.com\/profile_background_images\/344918034410158087\/38851478822519fa3c9f5d50284b00d4.jpeg","profile_background_tile":false,"profile_link_color":"000000","profile_sidebar_border_color":"FFFFFF","profile_sidebar_fill_color":"95E8EC","profile_text_color":"3C3940","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/593108136317300738\/tf4W1APu_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/593108136317300738\/tf4W1APu_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/56632588\/1420260655","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"trends":[],"urls":[],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1430263695204"}
I am using external table hive schema: 我正在使用外部表配置单元架构:
CREATE External TABLE tweets (
id BIGINT,
created_at STRING,
source STRING,
favorited BOOLEAN,
retweet_count INT,
retweeted_status STRUCT<
text:STRING,
user:STRUCT<screen_name:STRING,name:STRING>,
retweet_count:INT>,
entities STRUCT<
urls:ARRAY<STRUCT<expanded_url:STRING>>,
user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
hashtags:ARRAY<STRUCT<text:STRING>>>,
text STRING,
user STRUCT<
screen_name:STRING,
name:STRING,
friends_count:INT,
followers_count:INT,
statuses_count:INT,
verified:BOOLEAN,
utc_offset:INT,
time_zone:STRING>,
in_reply_to_screen_name STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
LOCATION '/user/hastimal/tweets';
hive> select * from tweets limit 1;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.json.JSONObject cannot be cast to [Ljava.lang.Object;
Also I tried all available in google/stackoverflow and I found something this Loading Linkedin JSON response into HIVE 我也尝试了google / stackoverflow中所有可用的方法,并且发现了一些将Linkedin JSON响应加载到HIVE中的内容
but not working. 但不起作用。 Please help.....
请帮忙.....
I think that problem with the jar. 我认为罐子有问题。 download the hive-serde jar from below link
从下面的链接下载蜂巢Serde Jar
files.cloudera.com/samples/hive-serdes-1.0-SNAPSHOT.jar
add the jar file 添加jar文件
hive> add jar hive-serdes-1.0-SNAPSHOT.jar
----create table --- ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe' like this ----创建表--- ROW FORMAT SERDE'com.cloudera.hive.serde.JSONSerDe'像这样
CREATE External TABLE tweetsjson (
id BIGINT,
created_at STRING,
source STRING,
favorited BOOLEAN,
retweet_count INT,
retweeted_status STRUCT<
text:STRING,
user:STRUCT<screen_name:STRING,name:STRING>,
retweet_count:INT>,
entities STRUCT<
urls:ARRAY<STRUCT<expanded_url:STRING>>,
user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
hashtags:ARRAY<STRUCT<text:STRING>>>,
text STRING,
user STRUCT<
screen_name:STRING,
name:STRING,
friends_count:INT,
followers_count:INT,
statuses_count:INT,
verified:BOOLEAN,
utc_offset:INT,
time_zone:STRING>,
in_reply_to_screen_name STRING
)
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
LOCATION '/user/hastimal/tweets';
let me know if not works 让我知道是否有效
you need to pass the jar when you create the external table 创建外部表时需要传递jar
ADD Jar (directory) 添加Jar(目录)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.