在Hive中查找十大熱門趨勢推文

Question

我正在根據retweet_count在蜂巢中找到十大趨勢推文，即，具有最高retweet_count的推文將是第一等。

這是選舉表詳細信息

id                      bigint                  from deserializer   
created_at              string                  from deserializer   
source                  string                  from deserializer   
favorited               boolean                 from deserializer   
retweeted_status        struct<text:string,user:struct<screen_name:string,name:string>,retweet_count:int>   from deserializer   
entities                struct<urls:array<struct<expanded_url:string>>,user_mentions:array<struct<screen_name:string,name:string>>,hashtags:array<struct<text:string>>> from deserializer   
text                    string                  from deserializer   
user                    struct<screen_name:string,name:string,friends_count:int,followers_count:int,statuses_count:int,verified:boolean,utc_offset:int,time_zone:string,location:string>    from deserializer   
in_reply_to_screen_name string                  from deserializer

我的查詢

select text 
from election 
where retweeted_status.retweet_count IN  
     (select  retweeted_status.retweet_count as zz 
      from election  
      order by zz desc  
      limit 10);

它給我10條相同的推文。 （TWEET-ABC，TWEET-ABC，TWEET-ABC，... TWEET-ABC）

所以我做的就是在運行內部查詢時打破嵌套查詢

select  retweeted_status.retweet_count as zz 
from election  
order by zz desc  
limit 10

它返回10個不同的值（1210,1209,1208,1207,1206，.... 1201）

之后，當我運行外部查詢時

select text 
from election  
where retweeted_status.retweet_count 
      IN  (1210,1209,1208,1207,1206,....1201 );

結果是相同的10條推文（TWEET-ABC，TWEET-ABC，TWEET-ABC，... TWEET-ABC）

我的查詢邏輯出了什么問題？

Answer 1

而不是使用計數，您應該使用id。 那是因為如果您有100條相同計數的tweet，那么LIMIT 10無關緊要，您將獲得100條記錄。

select text 
from election 
where id  IN  
     (select  id as zz 
      from election  
      order by retweeted_status.retweet_count desc  
      limit 10);

但仍然不確定為什么會得到錯誤的結果。

編輯（在我的評論之后）：

如果我的評論是正確的，那么您將擁有十次相同的ID。 在這種情況下，請更改為

     (select distinct id as zz 
      from election  
      order by retweeted_status.retweet_count desc  
      limit 10);

在Hive中查找十大熱門趨勢推文

問題描述

1 個解決方案

解決方案1
0 2016-07-29 04:48:12

在Hive中查找十大熱門趨勢推文

問題描述

1 個解決方案

解決方案1 0 2016-07-29 04:48:12

解決方案1
0 2016-07-29 04:48:12