如何使用基於pandas中另一列中的條件的值生成新列

Question

我有一個如下的數據幀，我需要生成一個名為“Comment”的新列，對於指定的值，它應該說“Fail”

輸入：

        Tel    MC             WT

        AAA    Rubber         9999
        BBB    Tree           0
        CCC    Rub            12
        AAA    Other          20
        BBB    Same           999
        DDD    Other-Same     70

試過代碼：

          df.loc[(df[WT] == 0 | df[WT] == 999 | df[WT] == 9999 | df[WT] == 99999),'Comment'] = 'Fail'

錯誤：

         AttributeError: 'str' object has no attribute 'loc'

預期產出：

       Tel    MC             WT      Comment
       AAA    Rubber         9999    Fail
       BBB    Tree           0       Fail
       CCC    Rub            12
       AAA    Other          20
       BBB    Same           999     Fail
       DDD    Other-Same     70

Answer 1

使用Series.isin作為測試成員資格，非匹配值是NaN ：

df.loc[df['WT'].isin([0, 999,9999,99999]),'Comment'] = 'Fail'
print (df)
   Tel          MC    WT Comment
0  AAA      Rubber  9999    Fail
1  BBB        Tree     0    Fail
2  CCC         Rub    12     NaN
3  AAA       Other    20     NaN
4  BBB        Same   999    Fail
5  DDD  Other-Same    70     NaN

如果需要分配Fail和空值，請使用numpy.where ：

df['Comment'] = np.where(df['WT'].isin([0, 999,9999,99999]), 'Fail', '')
print (df)
   Tel          MC    WT Comment
0  AAA      Rubber  9999    Fail
1  BBB        Tree     0    Fail
2  CCC         Rub    12        
3  AAA       Other    20        
4  BBB        Same   999    Fail
5  DDD  Other-Same    70

Answer 2

相反，鏈接多個條件，你已經isin了這一點：

df.loc[df.WT.isin([0,99,999,9999]), 'Comment'] = 'Fail'
df.Comment.fillna(' ', inplace=True)


  Tel          MC    WT Comment
0  AAA      Rubber  9999    Fail
1  BBB        Tree     0    Fail
2  CCC         Rub    12        
3  AAA       Other    20        
4  BBB        Same   999    Fail
5  DDD  Other-Same    70

或者是一個基於numpy的：

import numpy as np

df['comment'] = np.where(np.in1d(df.WT.values, [0,99,999,9999]), 'Fail', '')

Answer 3

使用list comprehension

df['Comment'] = ['Fail' if x in [0, 999, 9999, 99999] else '' for x in df['WT']]

   Tel          MC    WT Comment
0  AAA      Rubber  9999    Fail
1  BBB        Tree     0    Fail
2  CCC         Rub    12        
3  AAA       Other    20        
4  BBB        Same   999    Fail
5  DDD  Other-Same    70

計時

dfbig = pd.concat([df]*1000000, ignore_index=True)

print(dfbig.shape)
(6000000, 3)

list comprehension

%%timeit 
dfbig['Comment'] = ['Fail' if x in [0, 999, 9999, 99999] else '' for x in dfbig['WT']]

1.15 s ± 18.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

loc + isin + fillna

%%timeit
dfbig.loc[dfbig['WT'].isin([0, 999,9999,99999]),'Comment'] = 'Fail'
dfbig.Comment.fillna(' ', inplace=True)

431 ms ± 11.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

np.where

%%timeit
dfbig['Comment'] = np.where(dfbig['WT'].isin([0, 999,9999,99999]), 'Fail', '')

531 ms ± 6.98 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

apply

%%timeit
dfbig['Comment'] = dfbig['WT'].apply(lambda x: 'Fail' if x in [0, 999, 9999, 99999] else ' ')

1.03 s ± 45.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

np.where + np.in1d

%%timeit
dfbig['comment'] = np.where(np.in1d(dfbig.WT, [0,99,999,9999]), 'Fail', '')

538 ms ± 6.46 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Answer 4

在目標列上使用df.apply 。

df['Comment'] = df['WT'].apply(lambda x: 'Fail' if x in [0, 999, 9999, 99999] else ' ')

輸出：

  Tel          MC    WT Comment
0  AAA      Rubber  9999    Fail
1  BBB        Tree     0    Fail
2  CCC         Rub    12        
3  AAA       Other    20        
4  BBB        Same   999    Fail
5  DDD  Other-Same    70

Answer 5

根據你的編碼風格最容易（也可理解）的方法是使用numpy.where(df比df.apply（）更快：

df["Comment"] = np.where((df["WT"] == 0) | (df["WT"] == 999) | (df["WT"] == 9999) | (df["WT"] == 99999), "Fail", "")

np.where（）遍歷給定數組/數據幀列的條目/行。 有關更多信息，請參閱nump.where的文檔

希望這可以幫助。

如何使用基於pandas中另一列中的條件的值生成新列

問題描述

5 個解決方案

解決方案1
3 已采納 2019-09-16 11:33:19

解決方案2
3 2019-09-16 11:34:12

解決方案3
2 2019-09-16 11:37:29

解決方案4
1 2019-09-16 11:34:51

解決方案5
-1 2019-09-16 11:36:45

如何使用基於pandas中另一列中的條件的值生成新列

問題描述

5 個解決方案

解決方案1 3 已采納 2019-09-16 11:33:19

解決方案2 3 2019-09-16 11:34:12

解決方案3 2 2019-09-16 11:37:29

解決方案4 1 2019-09-16 11:34:51

解決方案5 -1 2019-09-16 11:36:45

解決方案1
3 已采納 2019-09-16 11:33:19

解決方案2
3 2019-09-16 11:34:12

解決方案3
2 2019-09-16 11:37:29

解決方案4
1 2019-09-16 11:34:51

解決方案5
-1 2019-09-16 11:36:45