簡體   English   中英

如何使用基於pandas中另一列中的條件的值生成新列

[英]How to generate new column with values based on condition in another column in pandas

我有一個如下的數據幀,我需要生成一個名為“Comment”的新列,對於指定的值,它應該說“Fail”

輸入:

        Tel    MC             WT

        AAA    Rubber         9999
        BBB    Tree           0
        CCC    Rub            12
        AAA    Other          20
        BBB    Same           999
        DDD    Other-Same     70 

試過代碼:

          df.loc[(df[WT] == 0 | df[WT] == 999 | df[WT] == 9999 | df[WT] == 99999),'Comment'] = 'Fail'

錯誤:

         AttributeError: 'str' object has no attribute 'loc'

預期產出:

       Tel    MC             WT      Comment
       AAA    Rubber         9999    Fail
       BBB    Tree           0       Fail
       CCC    Rub            12
       AAA    Other          20
       BBB    Same           999     Fail
       DDD    Other-Same     70

使用Series.isin作為測試成員資格,非匹配值是NaN

df.loc[df['WT'].isin([0, 999,9999,99999]),'Comment'] = 'Fail'
print (df)
   Tel          MC    WT Comment
0  AAA      Rubber  9999    Fail
1  BBB        Tree     0    Fail
2  CCC         Rub    12     NaN
3  AAA       Other    20     NaN
4  BBB        Same   999    Fail
5  DDD  Other-Same    70     NaN

如果需要分配Fail和空值,請使用numpy.where

df['Comment'] = np.where(df['WT'].isin([0, 999,9999,99999]), 'Fail', '')
print (df)
   Tel          MC    WT Comment
0  AAA      Rubber  9999    Fail
1  BBB        Tree     0    Fail
2  CCC         Rub    12        
3  AAA       Other    20        
4  BBB        Same   999    Fail
5  DDD  Other-Same    70        

相反,鏈接多個條件,你已經isin了這一點:

df.loc[df.WT.isin([0,99,999,9999]), 'Comment'] = 'Fail'
df.Comment.fillna(' ', inplace=True)


  Tel          MC    WT Comment
0  AAA      Rubber  9999    Fail
1  BBB        Tree     0    Fail
2  CCC         Rub    12        
3  AAA       Other    20        
4  BBB        Same   999    Fail
5  DDD  Other-Same    70        

或者是一個基於numpy的:

import numpy as np

df['comment'] = np.where(np.in1d(df.WT.values, [0,99,999,9999]), 'Fail', '')

使用list comprehension

df['Comment'] = ['Fail' if x in [0, 999, 9999, 99999] else '' for x in df['WT']]

   Tel          MC    WT Comment
0  AAA      Rubber  9999    Fail
1  BBB        Tree     0    Fail
2  CCC         Rub    12        
3  AAA       Other    20        
4  BBB        Same   999    Fail
5  DDD  Other-Same    70        

計時

dfbig = pd.concat([df]*1000000, ignore_index=True)

print(dfbig.shape)
(6000000, 3)
  1. list comprehension
%%timeit 
dfbig['Comment'] = ['Fail' if x in [0, 999, 9999, 99999] else '' for x in dfbig['WT']]

1.15 s ± 18.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
  1. loc + isin + fillna
%%timeit
dfbig.loc[dfbig['WT'].isin([0, 999,9999,99999]),'Comment'] = 'Fail'
dfbig.Comment.fillna(' ', inplace=True)

431 ms ± 11.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
  1. np.where
%%timeit
dfbig['Comment'] = np.where(dfbig['WT'].isin([0, 999,9999,99999]), 'Fail', '')

531 ms ± 6.98 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
  1. apply
%%timeit
dfbig['Comment'] = dfbig['WT'].apply(lambda x: 'Fail' if x in [0, 999, 9999, 99999] else ' ')

1.03 s ± 45.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
  1. np.where + np.in1d
%%timeit
dfbig['comment'] = np.where(np.in1d(dfbig.WT, [0,99,999,9999]), 'Fail', '')

538 ms ± 6.46 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

在目標列上使用df.apply

df['Comment'] = df['WT'].apply(lambda x: 'Fail' if x in [0, 999, 9999, 99999] else ' ')

輸出:

  Tel          MC    WT Comment
0  AAA      Rubber  9999    Fail
1  BBB        Tree     0    Fail
2  CCC         Rub    12        
3  AAA       Other    20        
4  BBB        Same   999    Fail
5  DDD  Other-Same    70        

根據你的編碼風格最容易(也可理解)的方法是使用numpy.where(df比df.apply()更快:

df["Comment"] = np.where((df["WT"] == 0) | (df["WT"] == 999) | (df["WT"] == 9999) | (df["WT"] == 99999), "Fail", "")

np.where()遍歷給定數組/數據幀列的條目/行。 有關更多信息,請參閱nump.where的文檔

希望這可以幫助。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM