簡體   English   中英

如何為多個內部聯接編寫 SQL 查詢?

[英]How to write a SQL query for multiple Inner Join?

樣本記錄:

    Row(user_id='KxGeqg5ccByhaZfQRI4Nnw', gender='male', year='2015', month='September', day='20', 
hour='16', weekday='Sunday', reviewClass='place love back', business_id='S75Lf-Q3bCCckQ3w7mSN2g', 
business_name='Notorious Burgers', city='Scottsdale', categories='Nightlife, American (New), Burgers, 
Comfort Food, Cocktail Bars, Restaurants, Food, Bars, American (Traditional)', user_funny='1', 
review_sentiment='Positive', friend_id='my4q3Sy6Ei45V58N2l8VGw')

該表有超過 1 億條記錄。 我的 SQL 查詢正在執行以下操作:

Select the most occurring review_sentiment among the friends (friend_id) and the most occurring gender among friends of a particular user visiting a specific business

friend_id is eventually a user_id

示例場景:

  • 一位用戶
  • 已訪問 4 家企業
  • 有10個朋友
  • 其中 5 位朋友訪問了第 1 家和第 2 家公司,而其他 5 位只訪問了第 3 家公司,沒有人訪問過第 4 家公司
  • 現在,對於業務 1 和 2,這 5 個朋友對 B1 的積極情緒多於消極情緒,對 B2 的 -ve 情緒多於 +ve,對 B3 的所有 -ve 情緒

我想要以下輸出:

**user_id | business_id | friend_common_sentiment | mostCommonGender | .... otherCols**

user_id_1 | business_id_1 | positive | male | .... otherCols
user_id_1 | business_id_2 | negative | female | .... otherCols
user_id_1 | business_id_3 | negative | female | .... otherCols

這是我在pyspark為此編寫的一個簡單查詢:

SELECT user_id, gender, year, month, day, hour, weekday, reviewClass, business_id, business_name, city, 
categories, user_funny, review_sentiment FROM events1 GROUP BY user_id, friend_id, business_id ORDER BY 
COUNT(review_sentiment DESC LIMIT 1

此查詢不會給出預期的結果,但我不確定如何將 INNER-JOIN 放入其中?

人類是不是那種數據結構讓事情變得困難了。 但是讓我們把它分解成幾個步驟,

  1. 您需要自行加入才能為朋友獲取數據
  2. 獲得朋友的數據后,執行聚合函數以獲取每個可能值的計數,按用戶和業務分組
  3. 子查詢上面的內容,以便根據計數在值之間做出決定。

我只是將你的桌子稱為“標簽”,所以加入如下,遺憾的是就像在現實生活中我們不能假設每個人都有朋友,而且由於你沒有指定排除永遠孤獨的人群,我們需要使用左連接來保持用戶沒有朋友。

From tags as user
left outer join tags as friends on user.friend_id = friends.user_id
    and friends.business_id = user.business_id

接下來,您必須弄清楚給定用戶和業務組合最常見的性別/評論是什么。 這是數據結構真正讓我們大吃一驚的地方,我們可以使用一些巧妙的窗口函數一步完成此操作,但我希望這個答案易於理解,因此我將使用子查詢和案例聲明。 為簡單起見,我假設性別是二元的,但根據應用程序的喚醒級別,您可以對其他性別遵循相同的模式。

select user.user_id, user.business_id
, sum(case when friends.gender = 'Male' then 1 else 0 end) as MaleFriends
, sum(case when friends.gender = 'Female' then 1 else 0 end) as FemaleFriends
, sum(case when friends.review_sentiment = 'Positive' then 1 else 0 end) as FriendsPositive
, sum(case when friends.review_sentiment = 'Negative' then 1 else 0 end) as FriendsNegative
From tags as user
left outer join tags as friends on user.friend_id = friends.user_id
  and friends.business_id = user.business_id
where user.business_id = <<your business id here>>
group by user.user_id, user.business_id

現在我們只需要從子查詢中獲取數據並做出一些決定,您可能想要添加一些額外的選項,例如您可能想要添加選項以防沒有朋友,或者朋友在性別/情緒之間平均分配. 與下面相同的模式,但有額外的值可供選擇。

select user_id
, business_id
, case when MaleFriends > than FemaleFriends then 'Male' else 'Female' as MostCommonGender
, case when FriendsPositive > FriendsNegative then 'Positive' else 'Negative' as MostCommonSentiment
from (    select user.user_id, user.business_id
, sum(case when friends.gender = 'Male' then 1 else 0 end) as MaleFriends
, sum(case when friends.gender = 'Female' then 1 else 0 end) as FemaleFriends
, sum(case when friends.review_sentiment = 'Positive' then 1 else 0 end) as FriendsPositive
, sum(case when friends.review_sentiment = 'Negative' then 1 else 0 end) as FriendsNegative
From tags as user
left outer join tags as friends on user.friend_id = friends.user_id
  and friends.business_id = user.business_id
where user.business_id = <<your business id here>>
group by user.user_id, user.business_id) as a

這為您提供了要遵循的步驟,並希望清楚地解釋它們的工作原理。 祝你好運!

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM