將 Python function 應用於一個 pandas 列並將 output 應用於多個列

Question

你好社區，

我已經閱讀了很多答案和博客，但我無法弄清楚我錯過了什么簡單的事情。我正在使用“條件”function 來定義所有條件並將其應用於一個 dataframe 列，如果條件滿足. 它應該創建/更新 2 個新的 dataframe 列“cat”和“subcat”。

如果你們能在這里幫助我，那將是一個很大的幫助！

dict = {'remark':['NA','NA','Category1','Category2','Category3'],
        'desc':['Present','Present','NA','Present','NA']
} 

df = pd.DataFrame(dict)

Dataframe 看起來像這樣：

          remark       desc
0         NA           Present      
1         NA           Present        
2         Category1    NA                   
3         Category2    Present                   
4         Category3    NA

我寫了一個 function 來定義如下條件：

def conditions(s):

    if (s == 'Category1'):
        x = 'insufficient'
        y = 'resolution'
    elif (s=='Category2):
        x= 'insufficient'
        y= 'information'
    elif (s=='Category3):
        x= 'Duplicate'
        y= 'ID repeated'
    else:
        x= 'NA'
        y= 'NA'
    
    return (x,y)

我有多種想法在 dataframe 列上執行上述 function 但沒有運氣。

df[['cat','subcat']] = df['remark'].apply(lambda x: pd.Series([conditions(df)[0],conditions(df)[1]]))

我預期的 dataframe 應如下所示：

          remark       desc        cat           subcat
0         NA           Present     NA            NA      
1         NA           Present     NA            NA
2         Category1    NA          insufficient  resolution         
3         Category2    Present     insufficient  information              
4         Category3    NA          Duplicate     ID repeated

非常感謝。

Answer 1

解決這個問題的一種方法是使用列表理解：

df[['cat', 'subcat']] = [("insufficient", "resolution")  if word == "Category1" else 
                         ("insufficient", "information") if word == "Category2" else
                         ("Duplicate", "ID repeated")    if word == "Category3" else 
                         ("NA", "NA")
                         for word in df.remark]

  remark      desc               cat         subcat
0   NA        Present          NA              NA
1   NA        Present          NA              NA
2   Category1   NA          insufficient    resolution
3   Category2   Present     insufficient    information
4   Category3   NA          Duplicate       ID repeated

@dm2 的回答顯示了如何使用您的 function 實現它。第一個apply(conditions)創建一個包含元組的系列，第二個apply創建單獨的列，形成一個 dataframe 然后您可以將其分配給cat和subcat 。

我建議列表理解的原因是，你正在處理字符串，在 Pandas 中，通過 vanilla python 處理字符串通常更快。 此外，通過列表理解，處理完成一次，您不需要應用條件 function 然后調用pd.Series 。 這給你更快的速度。 測試將斷言或揭穿這一點。

Answer 2

你可以這樣做：

 df[['cat','subcat']] = df['remark'].apply(conditions).apply(pd.Series)

Output：

  remark      desc               cat         subcat
0   NA        Present          NA              NA
1   NA        Present          NA              NA
2   Category1   NA          insufficient    resolution
3   Category2   Present     insufficient    information
4   Category3   NA          Duplicate       ID repeated

編輯：這可能是應用您已有的 function 的更簡單方法，但如果您有一個巨大的 DataFrame，為了更快的代碼，請使用列表理解查看@sammywemmy 的答案。

Answer 3

您正在傳遞整個dataframe ，您只需要傳遞 lambda 變量 ( x )。

df[['cat','subcat']] = df['remark'].apply(lambda x: pd.Series([*conditions(x)]))

* on iterables 可以unpack它們，所以你不需要兩次調用相同的 function 來提取 output。也許編譯器解決了這個問題，但我不這么認為......

Answer 4

您可以將series.replace與映射字典一起使用

df['cat'] = df.remark.replace({'Category1': 'insufficient',
    'Category2': 'insufficient', 'Category3': 'Duplicate'})
df['subcat'] = df.remark.replace({'Category1': 'resolution',
    'Category2': 'information', 'Category3': 'ID repeated'})

print(df)
      remark     desc           cat       subcat
0         NA  Present            NA           NA
1         NA  Present            NA           NA
2  Category1       NA  insufficient   resolution
3  Category2  Present  insufficient  information
4  Category3       NA     Duplicate  ID repeated

將 Python function 應用於一個 pandas 列並將 output 應用於多個列

問題描述

4 個解決方案

解決方案1
2 已采納 2020-08-19 21:10:20

解決方案2
2 2020-08-19 21:12:54

解決方案3
1 2020-08-19 21:13:56

解決方案4
0 2020-08-19 21:11:29

將 Python function 應用於一個 pandas 列並將 output 應用於多個列

問題描述

4 個解決方案

解決方案1 2 已采納 2020-08-19 21:10:20

解決方案2 2 2020-08-19 21:12:54

解決方案3 1 2020-08-19 21:13:56

解決方案4 0 2020-08-19 21:11:29

解決方案1
2 已采納 2020-08-19 21:10:20

解決方案2
2 2020-08-19 21:12:54

解決方案3
1 2020-08-19 21:13:56

解決方案4
0 2020-08-19 21:11:29