[英]How to generate new column with values based on condition in another column in pandas
我有一個如下的數據幀,我需要生成一個名為“Comment”的新列,對於指定的值,它應該說“Fail”
輸入:
Tel MC WT
AAA Rubber 9999
BBB Tree 0
CCC Rub 12
AAA Other 20
BBB Same 999
DDD Other-Same 70
試過代碼:
df.loc[(df[WT] == 0 | df[WT] == 999 | df[WT] == 9999 | df[WT] == 99999),'Comment'] = 'Fail'
錯誤:
AttributeError: 'str' object has no attribute 'loc'
預期產出:
Tel MC WT Comment
AAA Rubber 9999 Fail
BBB Tree 0 Fail
CCC Rub 12
AAA Other 20
BBB Same 999 Fail
DDD Other-Same 70
使用Series.isin
作為測試成員資格,非匹配值是NaN
:
df.loc[df['WT'].isin([0, 999,9999,99999]),'Comment'] = 'Fail'
print (df)
Tel MC WT Comment
0 AAA Rubber 9999 Fail
1 BBB Tree 0 Fail
2 CCC Rub 12 NaN
3 AAA Other 20 NaN
4 BBB Same 999 Fail
5 DDD Other-Same 70 NaN
如果需要分配Fail
和空值,請使用numpy.where
:
df['Comment'] = np.where(df['WT'].isin([0, 999,9999,99999]), 'Fail', '')
print (df)
Tel MC WT Comment
0 AAA Rubber 9999 Fail
1 BBB Tree 0 Fail
2 CCC Rub 12
3 AAA Other 20
4 BBB Same 999 Fail
5 DDD Other-Same 70
相反,鏈接多個條件,你已經isin
了這一點:
df.loc[df.WT.isin([0,99,999,9999]), 'Comment'] = 'Fail'
df.Comment.fillna(' ', inplace=True)
Tel MC WT Comment
0 AAA Rubber 9999 Fail
1 BBB Tree 0 Fail
2 CCC Rub 12
3 AAA Other 20
4 BBB Same 999 Fail
5 DDD Other-Same 70
或者是一個基於numpy
的:
import numpy as np
df['comment'] = np.where(np.in1d(df.WT.values, [0,99,999,9999]), 'Fail', '')
使用list comprehension
df['Comment'] = ['Fail' if x in [0, 999, 9999, 99999] else '' for x in df['WT']]
Tel MC WT Comment
0 AAA Rubber 9999 Fail
1 BBB Tree 0 Fail
2 CCC Rub 12
3 AAA Other 20
4 BBB Same 999 Fail
5 DDD Other-Same 70
計時
dfbig = pd.concat([df]*1000000, ignore_index=True)
print(dfbig.shape)
(6000000, 3)
list comprehension
%%timeit
dfbig['Comment'] = ['Fail' if x in [0, 999, 9999, 99999] else '' for x in dfbig['WT']]
1.15 s ± 18.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
loc
+ isin
+ fillna
%%timeit
dfbig.loc[dfbig['WT'].isin([0, 999,9999,99999]),'Comment'] = 'Fail'
dfbig.Comment.fillna(' ', inplace=True)
431 ms ± 11.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
np.where
%%timeit
dfbig['Comment'] = np.where(dfbig['WT'].isin([0, 999,9999,99999]), 'Fail', '')
531 ms ± 6.98 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
apply
%%timeit
dfbig['Comment'] = dfbig['WT'].apply(lambda x: 'Fail' if x in [0, 999, 9999, 99999] else ' ')
1.03 s ± 45.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
np.where
+ np.in1d
%%timeit
dfbig['comment'] = np.where(np.in1d(dfbig.WT, [0,99,999,9999]), 'Fail', '')
538 ms ± 6.46 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
在目標列上使用df.apply
。
df['Comment'] = df['WT'].apply(lambda x: 'Fail' if x in [0, 999, 9999, 99999] else ' ')
輸出:
Tel MC WT Comment
0 AAA Rubber 9999 Fail
1 BBB Tree 0 Fail
2 CCC Rub 12
3 AAA Other 20
4 BBB Same 999 Fail
5 DDD Other-Same 70
根據你的編碼風格最容易(也可理解)的方法是使用numpy.where(df
比df.apply()更快:
df["Comment"] = np.where((df["WT"] == 0) | (df["WT"] == 999) | (df["WT"] == 9999) | (df["WT"] == 99999), "Fail", "")
np.where()遍歷給定數組/數據幀列的條目/行。 有關更多信息,請參閱nump.where的文檔
希望這可以幫助。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.