簡體   English   中英

如何在 HIVE 中使用結構插入覆蓋?

[英]How do I INSERT OVERWRITE with a struct in HIVE?

我有一個 Hive 表推文存儲為文本,我試圖將其寫入另一個表tweetsORC ,即 ORC。 兩者具有相同的結構:

col_name    data_type   comment
racist                  boolean                 from deserializer   
contributors            string                  from deserializer   
coordinates             string                  from deserializer   
created_at              string                  from deserializer   
entities                struct<hashtags:array<string>,symbols:array<string>,urls:array<struct<display_url:string,expanded_url:string,indices:array<tinyint>,url:string>>,user_mentions:array<string>>   from deserializer   
favorite_count          tinyint                 from deserializer   
favorited               boolean                 from deserializer   
filter_level            string                  from deserializer   
geo                     string                  from deserializer   
id                      bigint                  from deserializer   
id_str                  string                  from deserializer   
in_reply_to_screen_name string                  from deserializer   
in_reply_to_status_id   string                  from deserializer   
in_reply_to_status_id_str   string                  from deserializer   
in_reply_to_user_id     string                  from deserializer   
in_reply_to_user_id_str string                  from deserializer   
is_quote_status         boolean                 from deserializer   
lang                    string                  from deserializer   
place                   string                  from deserializer   
possibly_sensitive      boolean                 from deserializer   
retweet_count           tinyint                 from deserializer   
retweeted               boolean                 from deserializer   
source                  string                  from deserializer   
text                    string                  from deserializer   
timestamp_ms            string                  from deserializer   
truncated               boolean                 from deserializer   
user                    struct<contributors_enabled:boolean,created_at:string,default_profile:boolean,default_profile_image:boolean,description:string,favourites_count:tinyint,follow_request_sent:string,followers_count:tinyint,following:string,friends_count:tinyint,geo_enabled:boolean,id:bigint,id_str:string,is_translator:boolean,lang:string,listed_count:tinyint,location:string,name:string,notifications:string,profile_background_color:string,profile_background_image_url:string,profile_background_image_url_https:string,profile_background_tile:boolean,profile_image_url:string,profile_image_url_https:string,profile_link_color:string,profile_sidebar_border_color:string,profile_sidebar_fill_color:string,profile_text_color:string,profile_use_background_image:boolean,protected:boolean,screen_name:string,statuses_count:smallint,time_zone:string,url:string,utc_offset:string,verified:boolean> from deserializer 

當我嘗試從推文插入到 tweetsORC 時,我得到:

INSERT OVERWRITE TABLE tweetsORC SELECT * FROM tweets;
FAILED: NoMatchingMethodException No matching method for class org.apache.hadoop.hive.ql.udf.UDFToString with (struct<hashtags:array<string>,symbols:array<string>,urls:array<struct<display_url:string,expanded_url:string,indices:array<tinyint>,url:string>>,user_mentions:array<string>>). Possible choices: _FUNC_(bigint)  _FUNC_(binary)  _FUNC_(boolean)  _FUNC_(date)  _FUNC_(decimal(38,18))  _FUNC_(double)  _FUNC_(float)  _FUNC_(int)  _FUNC_(smallint)  _FUNC_(string)  _FUNC_(timestamp)  _FUNC_(tinyint)  _FUNC_(void) 

我在此類問題上找到的唯一幫助是讓 UDF 使用原始類型,但我沒有使用 UDF! 任何幫助深表感謝!

僅供參考:蜂巢版本:

Hive 1.2.1000.2.4.2.0-258 顛覆 git://u12-slave-5708dfcd-10/grid/0/jenkins/workspace/HDP-build-ubuntu12/bigtop/output/hive/hive-1.2.1000.2.4.2 .0 -r 240760457150036e13035cbb82bcda0c65362f3a

編輯:創建表和示例數據:

create table tweets (
  contributors string,
  coordinates string,
  created_at string,
  entities struct <
    hashtags: array <string>,
    symbols: array <string>,
    urls: array <struct <
        display_url: string,
        expanded_url: string,
        indices: array <tinyint>,
        url: string>>,
    user_mentions: array <string>>,
  favorite_count tinyint,
  favorited boolean,
  filter_level string,
  geo string,
  id bigint,
  id_str string,
  in_reply_to_screen_name string,
  in_reply_to_status_id string,
  in_reply_to_status_id_str string,
  in_reply_to_user_id string,
  in_reply_to_user_id_str string,
  is_quote_status boolean,
  lang string,
  place string,
  possibly_sensitive boolean,
  retweet_count tinyint,
  retweeted boolean,
  source string,
  text string,
  timestamp_ms string,
  truncated boolean,
  `user` struct <
    contributors_enabled: boolean,
    created_at: string,
    default_profile: boolean,
    default_profile_image: boolean,
    description: string,
    favourites_count: tinyint,
    follow_request_sent: string,
    followers_count: tinyint,
    `following`: string,
    friends_count: tinyint,
    geo_enabled: boolean,
    id: bigint,
    id_str: string,
    is_translator: boolean,
    lang: string,
    listed_count: tinyint,
    location: string,
    name: string,
    notifications: string,
    profile_background_color: string,
    profile_background_image_url: string,
    profile_background_image_url_https: string,
    profile_background_tile: boolean,
    profile_image_url: string,
    profile_image_url_https: string,
    profile_link_color: string,
    profile_sidebar_border_color: string,
    profile_sidebar_fill_color: string,
    profile_text_color: string,
    profile_use_background_image: boolean,
    protected: boolean,
    screen_name: string,
    statuses_count: smallint,
    time_zone: string,
    url: string,
    utc_offset: string,
    verified: boolean>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS TEXTFILE;
LOAD DATA LOCAL INPATH '/home/ed/Downloads/hive-json-master/1abbo.txt' OVERWRITE INTO TABLE tweets;

create table tweetsORC (
racist boolean,
  contributors string,
  coordinates string,
  created_at string,
  entities struct <
    hashtags: array <string>,
    symbols: array <string>,
    urls: array <struct <
        display_url: string,
        expanded_url: string,
        indices: array <tinyint>,
        url: string>>,
    user_mentions: array <string>>,
  favorite_count tinyint,
  favorited boolean,
  filter_level string,
  geo string,
  id bigint,
  id_str string,
  in_reply_to_screen_name string,
  in_reply_to_status_id string,
  in_reply_to_status_id_str string,
  in_reply_to_user_id string,
  in_reply_to_user_id_str string,
  is_quote_status boolean,
  lang string,
  place string,
  possibly_sensitive boolean,
  retweet_count tinyint,
  retweeted boolean,
  source string,
  text string,
  timestamp_ms string,
  truncated boolean,
  `user` struct <
    contributors_enabled: boolean,
    created_at: string,
    default_profile: boolean,
    default_profile_image: boolean,
    description: string,
    favourites_count: tinyint,
    follow_request_sent: string,
    followers_count: tinyint,
    `following`: string,
    friends_count: tinyint,
    geo_enabled: boolean,
    id: bigint,
    id_str: string,
    is_translator: boolean,
    lang: string,
    listed_count: tinyint,
    location: string,
    name: string,
    notifications: string,
    profile_background_color: string,
    profile_background_image_url: string,
    profile_background_image_url_https: string,
    profile_background_tile: boolean,
    profile_image_url: string,
    profile_image_url_https: string,
    profile_link_color: string,
    profile_sidebar_border_color: string,
    profile_sidebar_fill_color: string,
    profile_text_color: string,
    profile_use_background_image: boolean,
    protected: boolean,
    screen_name: string,
    statuses_count: smallint,
    time_zone: string,
    url: string,
    utc_offset: string,
    verified: boolean>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS ORC tblproperties ("orc.compress"="ZLIB");

數據在這里

而不是使用 Select * 我按名稱列出字段並且錯誤消失。

數據類型不匹配:要插入的數據類型與對應數據表中的字段類型不一致。 例如,如果創建表時聲明的字段類型是字符串,但插入的字段類型確實是列表類型,則會拋出此錯誤。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM