[英]Pivot by group and merge in Pandas dataframe
I have two dataframes, company_df
and car_df
.我有两个数据框
company_df
和car_df
。 A company can have multiple cars and a car can only have one company.一家公司可以拥有多辆汽车,而一辆汽车只能拥有一家公司。
Company_DF公司_DF
Company_ID Company_Name
0 1 Ford
1 2 Holden
2 3 Kia
Car_DF汽车_DF
Company_ID Car_ID Car_Name
0 1 1 Falcon
1 1 2 Focus
2 2 1 Commodore
3 3 1 Sorento
4 3 2 Rio
5 3 2 Sportage
The Rio and Sportage have the same Car_ID on purpose, about 1 percent of my rows have this issue, it is not something I can change in my data source. Rio 和 Sportage 故意具有相同的 Car_ID,大约 1% 的行有这个问题,这不是我可以在我的数据源中更改的东西。
I would like to pivot each group of cars, by company, so that the cars are all on one line.我想pivot每组车,按公司,让车都在一条线上。 For example.
例如。
Company_ID Company_Name Car_ID_1 Car_Name_1 Car_ID_2 Car_Name_2 Car_ID_3 Car_Name_3
0 1 Ford 1 Falcon 2 Focus NaN NaN
1 2 Holden 1 Commodore NaN NaN NaN NaN
2 3 Kia 1 Sorento 2 Rio 2 Sportage
What I have at the moment works for 99 of the rows, is slow, and a messy way of doing it.我目前所拥有的适用于 99 行,速度很慢,而且是一种混乱的方式。 But I'm not sure how to improve on it.
但我不确定如何改进它。
import pandas as pd
company_df = pd.DataFrame([[1, 'Ford'], [2, 'Holden'], [3, 'Kia']], columns=['Company_ID', 'Company_Name'])
car_df = pd.DataFrame([[1, 1, 'Falcon'], [1, 2, 'Focus'], [2, 1, 'Commodore'], [3, 1, 'Sorento'], [3, 2, 'Rio'], [3, 2, 'Sportage']], columns=['Company_ID', 'Car_ID', 'Car_Name'])
for i in range(1, 3): # looping through car ids up to maximum, I don't want to do this though
car_by_id_df = car_df[car_df.Car_ID==i] # select cars with current loop iterator/index
car_by_id_df.columns = map(lambda col: '{}_{}'.format(col, i), car_by_id_df.columns) # rename all columns with ID as suffix,
car_by_id_df.rename(columns={'Company_ID_{}'.format(i): 'Company_ID'}, inplace=True) # Rename joining column back to original
company_df = company_df.merge(right=car_by_id_df, on='Company_ID', how='left') # Merge
print(company_df)
This returns the following.这将返回以下内容。 Note that
Kia
is duplicated because of Rio
and Sportage
have the same id.请注意,
Kia
是重复的,因为Rio
和Sportage
具有相同的 id。 I can't change the data in the Car_ID
column, and I'm not sure how else to pivot the dataframe.我无法更改
Car_ID
列中的数据,而且我不确定 pivot 和 dataframe 的其他方法。
Company_ID Company_Name Car_ID_1 Car_Name_1 Car_ID_2 Car_Name_2
0 1 Ford 1 Falcon 2 Focus
1 2 Holden 1 Commodore NaN NaN
2 3 Kia 1 Sorento 2 Rio
3 3 Kia 1 Sorento 2 Sportage
How can I pivot my car_df
by group and merge onto company_id
?如何按组 pivot 我的
car_df
并合并到company_id
?
This will do the trick:这可以解决问题:
res=Car_DF.set_index("Company_ID").stack().to_frame()
res["sub_no"]=res.groupby(level=[0,1]).cumcount().add(1).astype(str)
res=res.reset_index(level=1)
res["level_1"]=res["level_1"].str.cat(res["sub_no"], sep="_")
res=res.drop("sub_no", axis=1).set_index("level_1", append=True).unstack("level_1")
res.columns=map(lambda x: x[1], res.columns)
res=res[sorted(res.columns, key=lambda x: x.split("_")[-1])]
res=Company_DF.merge(res, on="Company_ID", how="left")
Outputs:输出:
Company_ID Company_Name ... Car_ID_3 Car_Name_3
0 1 Ford ... NaN NaN
1 2 Holden ... NaN NaN
2 3 Kia ... 2 Sportage
Found a solution.找到了解决方案。 I don't like the use of the for loop but it does work, and relatively fast.
我不喜欢使用 for 循环,但它确实有效,而且速度相对较快。
import pandas as pd
Company_DF = pd.DataFrame([[1, 'Ford'], [2, 'Holden'], [3, 'Kia']], columns=['Company_ID', 'Company_Name'])
Car_DF = pd.DataFrame([[1, 1, 'Falcon'], [1, 2, 'Focus'], [2, 1, 'Commodore'], [3, 1, 'Sorento'], [3, 2, 'Rio'], [3, 2, 'Sportage']], columns=['Company_ID', 'Car_ID', 'Car_Name'])
Car_DF['rank'] = Car_DF.groupby(['Company_ID']).cumcount() + 1
for ranking_number in range(Car_DF['rank'].min(), Car_DF['rank'].max()):
Ranked_Car_DF = Car_DF[Car_DF['rank']==ranking_number].copy()
Ranked_Car_DF.columns = map(lambda col: '{}_{}'.format(col, ranking_number), Ranked_Car_DF.columns)
Ranked_Car_DF.rename(columns={'Company_ID_{}'.format(ranking_number): 'Company_ID'}, inplace=True)
Company_DF = Company_DF.merge(right=Ranked_Car_DF, on='Company_ID', how='left')
print(Company_DF)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.