如何正確調用 function 並返回更新的 dataframe？

Question

我正在嘗試通過 function 處理和更新 dataframe 中的行，並返回 dataframe 以完成使用它。 當我嘗試將 dataframe 返回到原始 function 調用時，它返回一系列而不是預期的列更新。 一個簡單的例子如下：

df = pd.DataFrame(['adam', 'ed', 'dra','dave','sed','mike'], index =
['a', 'b', 'c', 'd', 'e', 'f'], columns=['A'])

def get_item(data):
    comb=pd.DataFrame()
    comb['Newfield'] = data     #create new columns
    comb['AnotherNewfield'] = 'y'

return pd.DataFrame(comb)

使用 apply 校准 function：

>>> newdf = df['A'].apply(get_item)

>>> newdf
a          A   Newfield AnotherNewfield
a  adam  st...
b          A   Newfield AnotherNewfield
e   sed  st...
c          A   Newfield AnotherNewfield
d  dave  st...
d          A   Newfield AnotherNewfield
d  dave  st...
e          A   Newfield AnotherNewfield
s   NaN  st...
f         A   Newfield AnotherNewfield
m  NaN  str(...
Name: A, dtype: object
>>> type(newdf)
<class 'pandas.core.series.Series'>

我認為 apply() 在這里不好，但我不太確定我應該如何通過 function 更新這個 dataframe 。

編輯：我很抱歉，但我似乎在編輯時不小心刪除了示例 function。 當我嘗試在其他帖子中找到的其他一些東西時，將其添加回此處。

以稍微不同的方式對單個變量進行測試 - 並返回多個系列變量 -> 似乎有效，所以我會看看這是否是我在實際案例中可以做的事情並進行更新。

def get_item(data):

    value = data     #create new columns
    AnotherNewfield = 'y'
    return pd.Series(value),pd.Series(AnotherNewfield)
df['B'], df['C'] = zip(*df['A'].apply(get_item))

Answer 1

您可以使用groupby和apply從apply調用中獲取 dataframe ，如下所示：

import pandas as pd

# add new column B for groupby - we need single group only to do the trick
df = pd.DataFrame(
    {'A':['adam', 'ed', 'dra','dave','sed','mike'], 'B': [1,1,1,1,1,1]},
    index=['a', 'b', 'c', 'd', 'e', 'f'])

def get_item(data):
    # create empty dataframe to be returned
    comb=pd.DataFrame(columns=['Newfield', 'AnotherNewfield'], data=None)
    # append series data (or any data) to dataframe's columns 
    comb['Newfield'] = comb['Newfield'].append(data['A'], ignore_index=True)
    comb['AnotherNewfield'] = 'y'
    # return complete dataframe
    return comb

# use column B for group to get tuple instead of dataframe
newdf = df.groupby('B').apply(get_item)
# after processing the dataframe newdf contains MultiIndex - simply remove the 0-level (index col B with value 1 gained from groupby operation)
newdf.droplevel(0)

Output：

    Newfield    AnotherNewfield
0   adam        y
1   ed          y
2   dra         y
3   dave        y
4   sed         y
5   mike        y

Answer 2

對於任何尋找潛在答案的人，我在執行我在另一篇文章中找到的這段代碼時得到了想要的結果。 會張貼那個人的名字以表揚他，但這基本上允許我編輯 function 並通過應用 function 獲取在不同列中創建的數據：

def get_item(data):
    
    value = data     #create new columns using variables
    AnotherNewfield = 'y'
    return pd.Series(value),pd.Series(AnotherNewfield)

>>> df['B'], df['C'] = zip(*df['A'].apply(get_item))
>>> df
      A        B     C
a  adam  (adam,)  (y,)
b    ed    (ed,)  (y,)
c   dra   (dra,)  (y,)
d  dave  (dave,)  (y,)
e   sed   (sed,)  (y,)
f  mike  (mike,)  (y,)
>>>

它帶來的唯一問題是 - 括號和逗號與數據一起出現。 我打算在 function 之外的代碼中刪除它。 或許這

>>> df['B'] = df['B'].apply(lambda x: re.sub(r"[^a-zA-Z0-9-]+", ' ', str(x)))
>>> df
      A       B     C
a  adam   adam   (y,)
b    ed     ed   (y,)
c   dra    dra   (y,)
d  dave   dave   (y,)
e   sed    sed   (y,)
f  mike   mike   (y,)
>>> df['C'] = df['C'].apply(lambda x: re.sub(r"[^a-zA-Z0-9-]+", ' ', str(x)))
>>> df
      A       B    C
a  adam   adam    y 
b    ed     ed    y 
c   dra    dra    y 
d  dave   dave    y 
e   sed    sed    y 
f  mike   mike    y

Answer 3

這將起作用：

df = pd.DataFrame(['adam', 'ed', 'dra','dave','sed','mike'], index =['a', 'b', 'c', 'd', 'e', 'f'], columns=['A'])
def get_item(data):
    comb=pd.DataFrame()
    comb['Newfield'] = data     #create new columns
    comb['AnotherNewfield'] = 'y'
    return comb
new_df = get_item(df)

如何正確調用 function 並返回更新的 dataframe？

問題描述

3 個解決方案

解決方案1
1 已采納 2021-09-03 12:49:43

解決方案2
0 2021-09-03 11:40:16

解決方案3
0 2022-09-21 19:50:22

如何正確調用 function 並返回更新的 dataframe？

問題描述

3 個解決方案

解決方案1 1 已采納 2021-09-03 12:49:43

解決方案2 0 2021-09-03 11:40:16

解決方案3 0 2022-09-21 19:50:22

解決方案1
1 已采納 2021-09-03 12:49:43

解決方案2
0 2021-09-03 11:40:16

解決方案3
0 2022-09-21 19:50:22