[英]How to concatenate / join these three dataframes
我有三個數據框df_Male,df_female,Df_TransGender
樣本數據框
df_Male
continent avg_count_country avg_age
Asia 55 5
Africa 65 10
Europe 75 8
df_Female
continent avg_count_country avg_age
Asia 50 7
Africa 60 12
Europe 70 0
df_Transgender
continent avg_count_country avg_age
Asia 30 6
Africa 40 11
America 80 10
現在我在下面這樣連接
frames = [df_Male, df_Female, df_Transgender]
df = pd.concat(frames, keys=['Male', 'Female', 'Transgender'])
正如你所看到的America
是目前在df_transgender
,同樣明智的歐洲存在於df_Male
和df_Female
所以我必須以某種方式進行合並,使其看起來像下面,但不是手動的,因為可能存在大量行
continent avg_count_country avg_age
Male 0 Asia 55 5
1 Africa 65 10
2 Europe 75 8
3 America 0 0
Female 0 Asia 50 7
1 Africa 60 12
2 Europe 70 0
3 America 0 0
Transgender 0 Asia 30 6
1 Africa 40 11
2 America 80 10
3 Europe 0 0
因此,對於其他continent
值, avg_count_country
和avg_age
應該為0
您可以在連接前添加“性別”列。
我們將分類數據與groupby
一起使用以計算笛卡爾乘積。 這還將產生性能優勢。
df = pd.concat([df_Male.assign(gender='Male'),
df_Female.assign(gender='Female'),
df_Transgender.assign(gender='Transgender')])
for col in ['gender', 'continent']:
df[col] = df[col].astype('category')
res = df.groupby(['gender', 'continent']).first().fillna(0).astype(int)
print(res)
avg_count_country avg_age
gender continent
Female Africa 60 12
America 0 0
Asia 50 7
Europe 70 0
Male Africa 65 10
America 0 0
Asia 55 5
Europe 75 8
Transgender Africa 40 11
America 80 10
Asia 30 6
Europe 0 0
您可以重新索引一下。
from itertools import product
# Get rid of that number in the index, not sure why you'd need it
df.index = df.index.droplevel(-1)
# Add continents to the index
df = df.set_index('continent', append=True)
# Determine product of indices
ids = list(product(df.index.get_level_values(0).unique(), df.index.get_level_values(1).unique()))
# Reindex and fill missing with 0
df = df.reindex(ids).fillna(0).reset_index(level=-1)
df
現在是:
continent avg_count_country avg_age
Male Asia 55.0 5.0
Male Africa 65.0 10.0
Male Europe 75.0 8.0
Male America 0.0 0.0
Female Asia 50.0 7.0
Female Africa 60.0 12.0
Female Europe 70.0 0.0
Female America 0.0 0.0
Transgender Asia 30.0 6.0
Transgender Africa 40.0 11.0
Transgender Europe 0.0 0.0
Transgender America 80.0 10.0
如果需要其他數字索引,則可以執行以下操作: df.groupby(df.index).cumcount()
對每個組中的值進行編號。
利用DataFrame.pivot
,對@jpp的答案稍作修改,就可以避免手動操作索引:
df = pd.concat([df_Male.assign(gender='Male'),
df_Female.assign(gender='Female'),
df_Transgender.assign(gender='Transgender')])
df.pivot('gender', 'continent').fillna(0).stack().astype(int)
avg_count_country avg_age
gender continent
Female Africa 60 12
America 0 0
Asia 50 7
Europe 70 0
Male Africa 65 10
America 0 0
Asia 55 5
Europe 75 8
Transgender Africa 40 11
America 80 10
Asia 30 6
Europe 0 0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.