简体   繁体   中英

In self join why one table giving null value?

I'm using databricks community edition. I created a temporary view.

%python
df.createOrReplaceTempView("athlete_events_csv")

表视图模式

The query i'm writing

with medal_count_by_country as
(SELECT NOC, Year, count(*) as medal_count, row_number() over( partition by NOC order by Year) as year_count
FROM athlete_events_csv
WHERE Medal in ('Gold', 'Silver', 'Bronze')
GROUP BY NOC, Year)

SELECT m1.NOC, m1.Year, m1.medal_count, m1.year_count, m2.year_count, ((m1.medal_count - m2.medal_count)/m1.medal_count)*100 as percentage_increase
FROM medal_count_by_country m1 left join medal_count_by_country m2
ON m1.NOC = m2.NOC AND m1.Year = m2.Year and m1.year_count-1 = m2.year_count

查询输出

Can anyone please guide me why the 'm2.year' count is showing as 'Null'?

I've a data-set about country, year etc wise athlete event details. I'm trying to get YOY winners' percentage increase.

This looks like data from the Summer Olympics? They're only held every 4 years, you probably need your join to be m1.year_count-4 = m2.year_count

The condition in join clause is wrong in this case.

The part that causes the issue is m1.Year = m2.Year .

You are trying to join data frames by year and then by row_number ordered by the same year column. That is why you get only null values. There are nothing to return.

Remove that part and you will receive proper results.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM