从Python中的另一列创建新列

Question

我在python中有一个pandas数据框，我们称它为df

在此数据帧中，我基于存在列创建一个新列，如下所示：

df.loc[:, 'new_col'] = df['col']

然后，我执行以下操作：

df[df['new_col']=='Above Average'] = 'Good'

但是，我注意到此操作还会更改df['col']

为了使df['col']的值不受我在df['new_col']进行的操作的影响，我该怎么办？

Answer 1

将DataFrame.loc与boolean indexing一起使用：

df.loc[df['new_col']=='Above Average', 'new_col'] = 'Good'

如果未指定任何列， Good条件将所有列设置为“ Good 。

同样，两行代码也应通过numpy.where或Series.mask更改为Series.mask ：

df['new_col'] = np.where(df['new_col']=='Above Average', 'Good', df['col'])

df['new_col'] = df['col'].mask(df['new_col']=='Above Average', 'Good')

编辑：要更改许多值，请使用带有字典的Series.replace或Series.map作为指定值：

d = {'Good':['Above average','effective'], 'Very Good':['Really effective']}

#swap key values in dict
#http://stackoverflow.com/a/31674731/2901002
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}
print (d1)
{'Above average': 'Good', 'effective': 'Good', 'Really effective': 'Very Good'}

df['new_col'] = df['col'].replace(d1)
#if large data obviously better performance
df['new_col'] = df['col'].map(d1).fillna(df['col'])

Answer 2

还有一个使用dataframe where方法的选项：

df['new_col'] = df['col']
df['new_col'].where(df['new_col']!='Above Average', other='Good', inplace=True )

但是要明确np.where是最快的方法：

m = df['col'] == 'Above Average'
df['new_column'] = np.where(m, 'Good', df['col'])

df['new_column']是新的列名。 如果mask m为True df['col']将被分配为'Good' 。

+----+---------------+
|    | col           |
|----+---------------|
|  0 | Nan           |
|  1 | Above Average |
|  2 | 1.0           |
+----+---------------+
+----+---------------+--------------+
|    | col           | new_column   |
|----+---------------+--------------|
|  0 | Nan           | Nan          |
|  1 | Above Average | Good         |
|  2 | 1.0           | 1.0          |
+----+---------------+--------------+

我还在这里提供有关使用df.loc时的遮罩的注意事项：

m = df['col']=='Above Average'
print(m)
df.loc[m, 'new_column'] = 'Good'

如您所见，结果将是相同的，但请注意，如果m为False则掩码m如何获得在何处读取值的信息

从Python中的另一列创建新列

问题描述

2 个解决方案

解决方案1
2 已采纳 2019-05-14 08:57:00

解决方案2
0 2019-05-14 10:13:10

从Python中的另一列创建新列

问题描述

2 个解决方案

解决方案1 2 已采纳 2019-05-14 08:57:00

解决方案2 0 2019-05-14 10:13:10

解决方案1
2 已采纳 2019-05-14 08:57:00

解决方案2
0 2019-05-14 10:13:10