[英]How to write a SQL query for multiple Inner Join?
樣本記錄:
Row(user_id='KxGeqg5ccByhaZfQRI4Nnw', gender='male', year='2015', month='September', day='20',
hour='16', weekday='Sunday', reviewClass='place love back', business_id='S75Lf-Q3bCCckQ3w7mSN2g',
business_name='Notorious Burgers', city='Scottsdale', categories='Nightlife, American (New), Burgers,
Comfort Food, Cocktail Bars, Restaurants, Food, Bars, American (Traditional)', user_funny='1',
review_sentiment='Positive', friend_id='my4q3Sy6Ei45V58N2l8VGw')
該表有超過 1 億條記錄。 我的 SQL 查詢正在執行以下操作:
Select the most occurring review_sentiment among the friends (friend_id) and the most occurring gender among friends of a particular user visiting a specific business
friend_id is eventually a user_id
示例場景:
我想要以下輸出:
**user_id | business_id | friend_common_sentiment | mostCommonGender | .... otherCols**
user_id_1 | business_id_1 | positive | male | .... otherCols
user_id_1 | business_id_2 | negative | female | .... otherCols
user_id_1 | business_id_3 | negative | female | .... otherCols
這是我在pyspark
為此編寫的一個簡單查詢:
SELECT user_id, gender, year, month, day, hour, weekday, reviewClass, business_id, business_name, city,
categories, user_funny, review_sentiment FROM events1 GROUP BY user_id, friend_id, business_id ORDER BY
COUNT(review_sentiment DESC LIMIT 1
此查詢不會給出預期的結果,但我不確定如何將 INNER-JOIN 放入其中?
人類是不是那種數據結構讓事情變得困難了。 但是讓我們把它分解成幾個步驟,
我只是將你的桌子稱為“標簽”,所以加入如下,遺憾的是就像在現實生活中我們不能假設每個人都有朋友,而且由於你沒有指定排除永遠孤獨的人群,我們需要使用左連接來保持用戶沒有朋友。
From tags as user
left outer join tags as friends on user.friend_id = friends.user_id
and friends.business_id = user.business_id
接下來,您必須弄清楚給定用戶和業務組合最常見的性別/評論是什么。 這是數據結構真正讓我們大吃一驚的地方,我們可以使用一些巧妙的窗口函數一步完成此操作,但我希望這個答案易於理解,因此我將使用子查詢和案例聲明。 為簡單起見,我假設性別是二元的,但根據應用程序的喚醒級別,您可以對其他性別遵循相同的模式。
select user.user_id, user.business_id
, sum(case when friends.gender = 'Male' then 1 else 0 end) as MaleFriends
, sum(case when friends.gender = 'Female' then 1 else 0 end) as FemaleFriends
, sum(case when friends.review_sentiment = 'Positive' then 1 else 0 end) as FriendsPositive
, sum(case when friends.review_sentiment = 'Negative' then 1 else 0 end) as FriendsNegative
From tags as user
left outer join tags as friends on user.friend_id = friends.user_id
and friends.business_id = user.business_id
where user.business_id = <<your business id here>>
group by user.user_id, user.business_id
現在我們只需要從子查詢中獲取數據並做出一些決定,您可能想要添加一些額外的選項,例如您可能想要添加選項以防沒有朋友,或者朋友在性別/情緒之間平均分配. 與下面相同的模式,但有額外的值可供選擇。
select user_id
, business_id
, case when MaleFriends > than FemaleFriends then 'Male' else 'Female' as MostCommonGender
, case when FriendsPositive > FriendsNegative then 'Positive' else 'Negative' as MostCommonSentiment
from ( select user.user_id, user.business_id
, sum(case when friends.gender = 'Male' then 1 else 0 end) as MaleFriends
, sum(case when friends.gender = 'Female' then 1 else 0 end) as FemaleFriends
, sum(case when friends.review_sentiment = 'Positive' then 1 else 0 end) as FriendsPositive
, sum(case when friends.review_sentiment = 'Negative' then 1 else 0 end) as FriendsNegative
From tags as user
left outer join tags as friends on user.friend_id = friends.user_id
and friends.business_id = user.business_id
where user.business_id = <<your business id here>>
group by user.user_id, user.business_id) as a
這為您提供了要遵循的步驟,並希望清楚地解釋它們的工作原理。 祝你好運!
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.