简体   繁体   English

从Python中的另一列创建新列

[英]Create a new column from another column in Python

I have a pandas dataframe in python, let's call it df 我在python中有一个pandas数据框,我们称它为df

In this dataframe I create a new column based on an exist column as follows: 在此数据帧中,我基于存在列创建一个新列,如下所示:

df.loc[:, 'new_col'] = df['col']

Then I do the following: 然后,我执行以下操作:

df[df['new_col']=='Above Average'] = 'Good'

However, I noticed that this operation also changes the values in df['col'] 但是,我注意到此操作还会更改df['col']

What should I do in order the values in df['col'] not to be affected by operations I do in df['new_col'] ? 为了使df['col']的值不受我在df['new_col']进行的操作的影响,我该怎么办?

Use DataFrame.loc with boolean indexing : DataFrame.locboolean indexing一起使用:

df.loc[df['new_col']=='Above Average', 'new_col'] = 'Good'

If no column is specified, all columns are set to Good by condition. 如果未指定任何列, Good条件将所有列设置为“ Good


Also both line of code should be changed to one by numpy.where or Series.mask : 同样,两行代码也应通过numpy.whereSeries.mask更改为Series.mask

df['new_col'] = np.where(df['new_col']=='Above Average', 'Good', df['col'])

df['new_col'] = df['col'].mask(df['new_col']=='Above Average', 'Good')

EDIT: For change many values use Series.replace or Series.map with dictionary for specified values: 编辑:要更改许多值,请使用带有字典的Series.replaceSeries.map作为指定值:

d = {'Good':['Above average','effective'], 'Very Good':['Really effective']}

#swap key values in dict
#http://stackoverflow.com/a/31674731/2901002
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}
print (d1)
{'Above average': 'Good', 'effective': 'Good', 'Really effective': 'Very Good'}

df['new_col'] = df['col'].replace(d1)
#if large data obviously better performance
df['new_col'] = df['col'].map(d1).fillna(df['col'])

There is also an option to use dataframe where method: 还有一个使用dataframe where方法的选项:

df['new_col'] = df['col']
df['new_col'].where(df['new_col']!='Above Average', other='Good', inplace=True )

But to be clear np.where is the fastest way to go: 但是要明确np.where是最快的方法:

m = df['col'] == 'Above Average'
df['new_column'] = np.where(m, 'Good', df['col'])

df['new_column'] is the new column name. df['new_column']是新的列名。 If mask m is True df['col'] will be assigned else 'Good' . 如果mask mTrue df['col']将被分配为'Good'


+----+---------------+
|    | col           |
|----+---------------|
|  0 | Nan           |
|  1 | Above Average |
|  2 | 1.0           |
+----+---------------+
+----+---------------+--------------+
|    | col           | new_column   |
|----+---------------+--------------|
|  0 | Nan           | Nan          |
|  1 | Above Average | Good         |
|  2 | 1.0           | 1.0          |
+----+---------------+--------------+

I am also providing here some notes on masking when using the df.loc : 我还在这里提供有关使用df.loc时的遮罩的注意事项:

m = df['col']=='Above Average'
print(m)
df.loc[m, 'new_column'] = 'Good'

As you may see the result will be the same, but note how mask m is having the information where to read the value if m is False 如您所见,结果将是相同的,但请注意,如果mFalse则掩码m如何获得在何处读取值的信息

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用python中另一列的值创建一个新列 - How to create a new column with a value from another column in python Python:在数据帧中,创建一个新列,并使用从另一列的值中切出的字符串 - Python: In a dataframe, create a new column with a string sliced from a column with the value of another column 通过解析列值为数据框创建新列,并使用来自另一列python的值填充新列 - Create new columns for a dataframe by parsing column values and populate new columns with values from another column python python pandas-如何合并日期和另一列的时间并创建新列 - python pandas - how to merge date from one and time from another column and create new column 如何使用另一列的滚动平均值创建新列 - Python - How to create a new column with the rolling mean of another column - Python Python Pandas 根据另一个列值创建新列 - Python Pandas create new column based on another column value 创建新列,检查是否与 Python 中的另一列相等 - Create new column that checks equality with another column in Python 如何使用 python 中另一列的下一行的值创建一个新列? - How do I create a new column with values from the next row of another column in python? Python 2.7:使用另一列中的子字符串创建新的df列 - Python 2.7: Create a new df column with substring from string in another column Python Pandas 数据框创建一个新列,其中包含从另一列中减去的值 - Python Pandas dataframe create a new column which contains the subtraction from another column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM