简体   繁体   English

get_dummies() 用于多个 Pandas DataFrame

[英]get_dummies() for multiple Pandas DataFrame's

I have a list of DataFrames and I would like to one-hot encode some of the columns in place.我有一个 DataFrame 列表,我想对一些列进行一次性编码。 For example, if:例如,如果:

In[1]:  df1 = pd.DataFrame(np.array([['a', 'a'], ['b', 'b'], ['c', 'c']]), 
                   columns=['col_1', 'col_2'])
        df2 = pd.DataFrame(np.array([['a', 'a'], ['b', 'b'], ['c', 'c']]),
                   columns=['col_1', 'col_2'])

        combined = [df1, df2]
        combined


Out[1]:    col_1  col_2
        0      a      a
        1      b      b
        2      c      c

I'm currently using the following approach.我目前正在使用以下方法。

In[2]:  for df in combined:
            one_hot = pd.get_dummies(df["col_2"])

            df[one_hot.columns] = one_hot
            df.drop("col_2", axis=1, inplace=True)

        
        df1

Out[2]:      col_1   a   b   c
          0      a   1   0   0
          1      b   0   1   0 
          2      c   0   0   1

Am I missing a more concise solution?我错过了更简洁的解决方案吗?


Edit编辑

An important requirement is that I need to modify the original dataframes.一个重要的要求是我需要修改原始数据帧。

I think you can using concat with key which will add a new level of index , then get_dummies我认为您可以将concatkey一起使用,这将添加一个新级别的索引,然后get_dummies

s=pd.concat(combined,keys=range(len(combined)))['col_2'].str.get_dummies()
s['col_1']=pd.concat(combined,keys=range(len(combined)))['col_1'].values

s
Out[20]: 
     a  b  c col_1
0 0  1  0  0     a
  1  0  1  0     b
  2  0  0  1     c
1 0  1  0  0     a
  1  0  1  0     b
  2  0  0  1     c

If you would like to save them into a list for different dfs , you can groupby and save it to dict如果您想将它们保存到不同 dfs 的列表中,您可以groupby并将其保存到dict

d={x:y.reset_index(level=0,drop=True) for x , y in s.groupby(level=0)}
d
Out[16]: 
{0:    a  b  c
 0  1  0  0
 1  0  1  0
 2  0  0  1, 1:    a  b  c
 0  1  0  0
 1  0  1  0
 2  0  0  1}

OP's method is just fine OP的方法就好了

for df in combined:
    one_hot = pd.get_dummies(df["col_2"])

    df[one_hot.columns] = one_hot
    df.drop("col_2", axis=1, inplace=True)

Reassign to all names重新分配给所有名称

df1, df2 = [df.join(pd.get_dummies(df['col_2'])).drop('col_2', 1) for df in combined]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM