[英]Create a new column from another column in Python
我在python中有一個pandas數據框,我們稱它為df
在此數據幀中,我基於存在列創建一個新列,如下所示:
df.loc[:, 'new_col'] = df['col']
然后,我執行以下操作:
df[df['new_col']=='Above Average'] = 'Good'
但是,我注意到此操作還會更改df['col']
為了使df['col']
的值不受我在df['new_col']
進行的操作的影響,我該怎么辦?
將DataFrame.loc
與boolean indexing
一起使用:
df.loc[df['new_col']=='Above Average', 'new_col'] = 'Good'
如果未指定任何列, Good
條件將所有列設置為“ Good
。
同樣,兩行代碼也應通過numpy.where
或Series.mask
更改為Series.mask
:
df['new_col'] = np.where(df['new_col']=='Above Average', 'Good', df['col'])
df['new_col'] = df['col'].mask(df['new_col']=='Above Average', 'Good')
編輯:要更改許多值,請使用帶有字典的Series.replace
或Series.map
作為指定值:
d = {'Good':['Above average','effective'], 'Very Good':['Really effective']}
#swap key values in dict
#http://stackoverflow.com/a/31674731/2901002
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}
print (d1)
{'Above average': 'Good', 'effective': 'Good', 'Really effective': 'Very Good'}
df['new_col'] = df['col'].replace(d1)
#if large data obviously better performance
df['new_col'] = df['col'].map(d1).fillna(df['col'])
還有一個使用dataframe where
方法的選項:
df['new_col'] = df['col']
df['new_col'].where(df['new_col']!='Above Average', other='Good', inplace=True )
但是要明確np.where
是最快的方法:
m = df['col'] == 'Above Average'
df['new_column'] = np.where(m, 'Good', df['col'])
df['new_column']
是新的列名。 如果mask m
為True
df['col']
將被分配為'Good'
。
+----+---------------+
| | col |
|----+---------------|
| 0 | Nan |
| 1 | Above Average |
| 2 | 1.0 |
+----+---------------+
+----+---------------+--------------+
| | col | new_column |
|----+---------------+--------------|
| 0 | Nan | Nan |
| 1 | Above Average | Good |
| 2 | 1.0 | 1.0 |
+----+---------------+--------------+
我還在這里提供有關使用df.loc
時的遮罩的注意事項:
m = df['col']=='Above Average'
print(m)
df.loc[m, 'new_column'] = 'Good'
如您所見,結果將是相同的,但請注意,如果m
為False
則掩碼m
如何獲得在何處讀取值的信息
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.