如何串联/连接这三个数据框

Question

I have three dataframes df_Male , df_female , Df_TransGender 我有三个数据框df_Male，df_female，Df_TransGender

sample dataframe 样本数据框

df_Male

continent   avg_count_country   avg_age
  Asia          55                5
  Africa        65                10
  Europe        75                8

df_Female

continent   avg_count_country   avg_age
  Asia          50                7
  Africa        60                12
  Europe        70                0

df_Transgender

continent   avg_count_country   avg_age
  Asia          30                6
  Africa        40                11
  America       80                10

Now I am concatenating like this below 现在我在下面这样连接

frames = [df_Male, df_Female, df_Transgender]
df = pd.concat(frames, keys=['Male', 'Female', 'Transgender'])

As you can see America is present in df_transgender , same wise Europe is present in df_Male and df_Female 正如你所看到的America是目前在df_transgender ，同样明智的欧洲存在于df_Male和df_Female

So I have to concat it in a way so that it looks like below but not manual as there can be huge number of rows 所以我必须以某种方式进行合并，使其看起来像下面，但不是手动的，因为可能存在大量行

              continent  avg_count_country  avg_age
Male        0      Asia                 55        5
            1    Africa                 65       10
            2    Europe                 75        8
            3    America                 0        0
Female      0      Asia                 50        7
            1    Africa                 60       12
            2    Europe                 70        0
            3    America                 0        0
Transgender 0      Asia                 30        6
            1    Africa                 40       11
            2    America                80       10
            3    Europe                 0         0

So for other continent values avg_count_country and avg_age should be 0 因此，对于其他continent值， avg_count_country和avg_age应该为0

Answer 1

You can add a "Gender" column before concatenating. 您可以在连接前添加“性别”列。

We use Categorical Data with groupby to calculate the Cartesian product. 我们将分类数据与groupby一起使用以计算笛卡尔乘积。 This should also yield performance benefits. 这还将产生性能优势。

df = pd.concat([df_Male.assign(gender='Male'),
                df_Female.assign(gender='Female'),
                df_Transgender.assign(gender='Transgender')])

for col in ['gender', 'continent']:
    df[col] = df[col].astype('category')

res = df.groupby(['gender', 'continent']).first().fillna(0).astype(int)

print(res)

                       avg_count_country  avg_age
gender      continent                            
Female      Africa                    60       12
            America                    0        0
            Asia                      50        7
            Europe                    70        0
Male        Africa                    65       10
            America                    0        0
            Asia                      55        5
            Europe                    75        8
Transgender Africa                    40       11
            America                   80       10
            Asia                      30        6
            Europe                     0        0

Answer 2

You can reindex a bit. 您可以重新索引一下。

from itertools import product

# Get rid of that number in the index, not sure why you'd need it
df.index = df.index.droplevel(-1)
# Add continents to the index
df = df.set_index('continent', append=True)

# Determine product of indices
ids = list(product(df.index.get_level_values(0).unique(), df.index.get_level_values(1).unique()))

# Reindex and fill missing with 0
df = df.reindex(ids).fillna(0).reset_index(level=-1)

df is now: df现在是：

            continent  avg_count_country  avg_age
Male             Asia               55.0      5.0
Male           Africa               65.0     10.0
Male           Europe               75.0      8.0
Male          America                0.0      0.0
Female           Asia               50.0      7.0
Female         Africa               60.0     12.0
Female         Europe               70.0      0.0
Female        America                0.0      0.0
Transgender      Asia               30.0      6.0
Transgender    Africa               40.0     11.0
Transgender    Europe                0.0      0.0
Transgender   America               80.0     10.0

If you want that other numeric index, then you can just do: df.groupby(df.index).cumcount() to number the values in each group. 如果需要其他数字索引，则可以执行以下操作： df.groupby(df.index).cumcount()对每个组中的值进行编号。

Answer 3

Making use of DataFrame.pivot , a slight modification to @jpp's answer allows you to avoid having to manually manipulate indices: 利用DataFrame.pivot ，对@jpp的答案稍作修改，就可以避免手动操作索引：

df = pd.concat([df_Male.assign(gender='Male'),
                df_Female.assign(gender='Female'),
                df_Transgender.assign(gender='Transgender')])

df.pivot('gender', 'continent').fillna(0).stack().astype(int)

                       avg_count_country  avg_age
gender      continent
Female      Africa                    60       12
            America                    0        0
            Asia                      50        7
            Europe                    70        0
Male        Africa                    65       10
            America                    0        0
            Asia                      55        5
            Europe                    75        8
Transgender Africa                    40       11
            America                   80       10
            Asia                      30        6
            Europe                     0        0

如何串联/连接这三个数据框

问题描述

3 个解决方案

解决方案1
2 已采纳 2018-07-13 14:51:21

解决方案2
1 2018-07-13 14:56:05

解决方案3
1 2018-07-13 15:32:41

如何串联/连接这三个数据框

问题描述

3 个解决方案

解决方案1 2 已采纳 2018-07-13 14:51:21

解决方案2 1 2018-07-13 14:56:05

解决方案3 1 2018-07-13 15:32:41

解决方案1
2 已采纳 2018-07-13 14:51:21

解决方案2
1 2018-07-13 14:56:05

解决方案3
1 2018-07-13 15:32:41