簡體   English   中英

如何串聯/連接這三個數據框

[英]How to concatenate / join these three dataframes

我有三個數據框df_Male,df_female,Df_TransGender

樣本數據框

df_Male

continent   avg_count_country   avg_age
  Asia          55                5
  Africa        65                10
  Europe        75                8

df_Female

continent   avg_count_country   avg_age
  Asia          50                7
  Africa        60                12
  Europe        70                0

df_Transgender

continent   avg_count_country   avg_age
  Asia          30                6
  Africa        40                11
  America       80                10

現在我在下面這樣連接

frames = [df_Male, df_Female, df_Transgender]
df = pd.concat(frames, keys=['Male', 'Female', 'Transgender'])

正如你所看到的America是目前在df_transgender ,同樣明智的歐洲存在於df_Maledf_Female

所以我必須以某種方式進行合並,使其看起來像下面,但不是手動的,因為可能存在大量行

              continent  avg_count_country  avg_age
Male        0      Asia                 55        5
            1    Africa                 65       10
            2    Europe                 75        8
            3    America                 0        0
Female      0      Asia                 50        7
            1    Africa                 60       12
            2    Europe                 70        0
            3    America                 0        0
Transgender 0      Asia                 30        6
            1    Africa                 40       11
            2    America                80       10
            3    Europe                 0         0

因此,對於其他continent值, avg_count_countryavg_age應該為0

您可以在連接前添加“性別”列。

我們將分類數據groupby一起使用以計算笛卡爾乘積。 這還將產生性能優勢。

df = pd.concat([df_Male.assign(gender='Male'),
                df_Female.assign(gender='Female'),
                df_Transgender.assign(gender='Transgender')])

for col in ['gender', 'continent']:
    df[col] = df[col].astype('category')

res = df.groupby(['gender', 'continent']).first().fillna(0).astype(int)

print(res)

                       avg_count_country  avg_age
gender      continent                            
Female      Africa                    60       12
            America                    0        0
            Asia                      50        7
            Europe                    70        0
Male        Africa                    65       10
            America                    0        0
            Asia                      55        5
            Europe                    75        8
Transgender Africa                    40       11
            America                   80       10
            Asia                      30        6
            Europe                     0        0

您可以重新索引一下。

from itertools import product

# Get rid of that number in the index, not sure why you'd need it
df.index = df.index.droplevel(-1)
# Add continents to the index
df = df.set_index('continent', append=True)

# Determine product of indices
ids = list(product(df.index.get_level_values(0).unique(), df.index.get_level_values(1).unique()))

# Reindex and fill missing with 0
df = df.reindex(ids).fillna(0).reset_index(level=-1)

df現在是:

            continent  avg_count_country  avg_age
Male             Asia               55.0      5.0
Male           Africa               65.0     10.0
Male           Europe               75.0      8.0
Male          America                0.0      0.0
Female           Asia               50.0      7.0
Female         Africa               60.0     12.0
Female         Europe               70.0      0.0
Female        America                0.0      0.0
Transgender      Asia               30.0      6.0
Transgender    Africa               40.0     11.0
Transgender    Europe                0.0      0.0
Transgender   America               80.0     10.0

如果需要其他數字索引,則可以執行以下操作: df.groupby(df.index).cumcount()對每個組中的值進行編號。

利用DataFrame.pivot ,對@jpp的答案稍作修改,就可以避免手動操作索引:

df = pd.concat([df_Male.assign(gender='Male'),
                df_Female.assign(gender='Female'),
                df_Transgender.assign(gender='Transgender')])

df.pivot('gender', 'continent').fillna(0).stack().astype(int)

                       avg_count_country  avg_age
gender      continent
Female      Africa                    60       12
            America                    0        0
            Asia                      50        7
            Europe                    70        0
Male        Africa                    65       10
            America                    0        0
            Asia                      55        5
            Europe                    75        8
Transgender Africa                    40       11
            America                   80       10
            Asia                      30        6
            Europe                     0        0

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM