[英]get_dummies() for multiple Pandas DataFrame's
I have a list of DataFrames and I would like to one-hot encode some of the columns in place.我有一个 DataFrame 列表,我想对一些列进行一次性编码。 For example, if:
例如,如果:
In[1]: df1 = pd.DataFrame(np.array([['a', 'a'], ['b', 'b'], ['c', 'c']]),
columns=['col_1', 'col_2'])
df2 = pd.DataFrame(np.array([['a', 'a'], ['b', 'b'], ['c', 'c']]),
columns=['col_1', 'col_2'])
combined = [df1, df2]
combined
Out[1]: col_1 col_2
0 a a
1 b b
2 c c
I'm currently using the following approach.我目前正在使用以下方法。
In[2]: for df in combined:
one_hot = pd.get_dummies(df["col_2"])
df[one_hot.columns] = one_hot
df.drop("col_2", axis=1, inplace=True)
df1
Out[2]: col_1 a b c
0 a 1 0 0
1 b 0 1 0
2 c 0 0 1
Am I missing a more concise solution?我错过了更简洁的解决方案吗?
An important requirement is that I need to modify the original dataframes.一个重要的要求是我需要修改原始数据帧。
I think you can using concat
with key
which will add a new level of index , then get_dummies
我认为您可以将
concat
与key
一起使用,这将添加一个新级别的索引,然后get_dummies
s=pd.concat(combined,keys=range(len(combined)))['col_2'].str.get_dummies()
s['col_1']=pd.concat(combined,keys=range(len(combined)))['col_1'].values
s
Out[20]:
a b c col_1
0 0 1 0 0 a
1 0 1 0 b
2 0 0 1 c
1 0 1 0 0 a
1 0 1 0 b
2 0 0 1 c
If you would like to save them into a list for different dfs , you can groupby
and save it to dict
如果您想将它们保存到不同 dfs 的列表中,您可以
groupby
并将其保存到dict
d={x:y.reset_index(level=0,drop=True) for x , y in s.groupby(level=0)}
d
Out[16]:
{0: a b c
0 1 0 0
1 0 1 0
2 0 0 1, 1: a b c
0 1 0 0
1 0 1 0
2 0 0 1}
for df in combined:
one_hot = pd.get_dummies(df["col_2"])
df[one_hot.columns] = one_hot
df.drop("col_2", axis=1, inplace=True)
df1, df2 = [df.join(pd.get_dummies(df['col_2'])).drop('col_2', 1) for df in combined]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.