简体   繁体   English

使用Pandas的左联接表(1:n),行数与左表相同

[英]Left join tables (1:n) using Pandas, keeping number of rows the same as left table

How do I left join tables with 1:n relationship, while keeping the number of rows the same as left table and concatenating any duplicate data with a character/string like ';'. 如何保留具有1:n关系的联接表,同时使行数与左表相同,并使用';'等字符/字符串连接所有重复数据。

Example: 例:
Country Table 国家表

CountryID      Country      Area
1              UK           1029
2              Russia       8374

Cities Table 城市表

CountryID      City     
1              London           
1              Manchester       
2              Moscow          
2              Ufa   

I want: 我想要:

CountryID      Country      Area      Cities
1              UK           1029      London;Manchester
2              Russia       8374      Moscow;Ufa

I know how to perform a normal left join 我知道如何执行正常的左联接

country.merge(city, how='left', on='CountryID')

which gives me four rows instead of two: 这给了我四行而不是两行:

Area      Country      CountryID      City
1029      UK           1              London
1029      UK           1              Manchester
8374      Russia       2              Moscow
8374      Russia       2              Ufa

Use map by Series created by groupby + join for new column in df1 if performance is important: 如果性能很重要,请使用由groupby + join创建的Series by map作为df1新列:

df1['Cities'] = df1['CountryID'].map(df2.groupby('CountryID')['City'].apply(';'.join))
print (df1)
   CountryID Country  Area             Cities
0          1      UK  1029  London;Manchester
1          2  Russia  8374         Moscow;Ufa

Detail : 详细说明

print (df2.groupby('CountryID')['City'].apply(';'.join))
CountryID
1    London;Manchester
2           Moscow;Ufa
Name: City, dtype: object

Another solution with join : join另一种解决方案:

df = df1.join(df2.groupby('CountryID')['City'].apply(';'.join), on='CountryID')
print (df)
   CountryID Country  Area               City
0          1      UK  1029  London;Manchester
1          2  Russia  8374         Moscow;Ufa

This will give you the desired result: 这将为您提供所需的结果:

df1.merge(df2, on='CountryID').groupby(['CountryID', 'Country', 'Area']).agg({'City': lambda x: ';'.join(x)}).reset_index()

#   CountryID Country  Area               City
#0          1      UK  1029  London;Manchester
#1          2  Russia  8374         Moscow;Ufa

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM