如何正确调用 function 并返回更新的 dataframe？

Question

I am trying to process and update rows in a dataframe through a function, and return the dataframe to finish using it.我正在尝试通过 function 处理和更新 dataframe 中的行，并返回 dataframe 以完成使用它。 When I try to return the dataframe to the original function call, it returns a series and not the expected column updates.当我尝试将 dataframe 返回到原始 function 调用时，它返回一系列而不是预期的列更新。 A simple example is below:一个简单的例子如下：

df = pd.DataFrame(['adam', 'ed', 'dra','dave','sed','mike'], index =
['a', 'b', 'c', 'd', 'e', 'f'], columns=['A'])

def get_item(data):
    comb=pd.DataFrame()
    comb['Newfield'] = data     #create new columns
    comb['AnotherNewfield'] = 'y'

return pd.DataFrame(comb)

Caling a function using apply:使用 apply 校准 function：

>>> newdf = df['A'].apply(get_item)

>>> newdf
a          A   Newfield AnotherNewfield
a  adam  st...
b          A   Newfield AnotherNewfield
e   sed  st...
c          A   Newfield AnotherNewfield
d  dave  st...
d          A   Newfield AnotherNewfield
d  dave  st...
e          A   Newfield AnotherNewfield
s   NaN  st...
f         A   Newfield AnotherNewfield
m  NaN  str(...
Name: A, dtype: object
>>> type(newdf)
<class 'pandas.core.series.Series'>

I assume that apply() is bad here, but am not quite sure how I 'should' be updating this dataframe via function otherwise.我认为 apply() 在这里不好，但我不太确定我应该如何通过 function 更新这个 dataframe 。

Edit: I appologize but i seems I accidentally deleted the sample function on an edit.编辑：我很抱歉，但我似乎在编辑时不小心删除了示例 function。 added it back here as I attempt a few other things I found in other posts.当我尝试在其他帖子中找到的其他一些东西时，将其添加回此处。

Testing in a slightly different manner with individual variables - and returning multiple series variables -> seems to work so I will see if this is something I can do in my actual case and update.以稍微不同的方式对单个变量进行测试 - 并返回多个系列变量 -> 似乎有效，所以我会看看这是否是我在实际案例中可以做的事情并进行更新。

def get_item(data):

    value = data     #create new columns
    AnotherNewfield = 'y'
    return pd.Series(value),pd.Series(AnotherNewfield)
df['B'], df['C'] = zip(*df['A'].apply(get_item))

Answer 1

You could use groupby with apply to get dataframe from apply call, like this:您可以使用groupby和apply从apply调用中获取 dataframe ，如下所示：

import pandas as pd

# add new column B for groupby - we need single group only to do the trick
df = pd.DataFrame(
    {'A':['adam', 'ed', 'dra','dave','sed','mike'], 'B': [1,1,1,1,1,1]},
    index=['a', 'b', 'c', 'd', 'e', 'f'])

def get_item(data):
    # create empty dataframe to be returned
    comb=pd.DataFrame(columns=['Newfield', 'AnotherNewfield'], data=None)
    # append series data (or any data) to dataframe's columns 
    comb['Newfield'] = comb['Newfield'].append(data['A'], ignore_index=True)
    comb['AnotherNewfield'] = 'y'
    # return complete dataframe
    return comb

# use column B for group to get tuple instead of dataframe
newdf = df.groupby('B').apply(get_item)
# after processing the dataframe newdf contains MultiIndex - simply remove the 0-level (index col B with value 1 gained from groupby operation)
newdf.droplevel(0)

Output: Output：

    Newfield    AnotherNewfield
0   adam        y
1   ed          y
2   dra         y
3   dave        y
4   sed         y
5   mike        y

Answer 2

For anyone looking for a potential answer to this, I got the desired result when executing this code I found in another post.对于任何寻找潜在答案的人，我在执行我在另一篇文章中找到的这段代码时得到了想要的结果。 Will post that guy's name to credit him, but this essentially allowed me to edit the function and get the data that was created in the different columns via the apply function:会张贴那个人的名字以表扬他，但这基本上允许我编辑 function 并通过应用 function 获取在不同列中创建的数据：

def get_item(data):
    
    value = data     #create new columns using variables
    AnotherNewfield = 'y'
    return pd.Series(value),pd.Series(AnotherNewfield)

>>> df['B'], df['C'] = zip(*df['A'].apply(get_item))
>>> df
      A        B     C
a  adam  (adam,)  (y,)
b    ed    (ed,)  (y,)
c   dra   (dra,)  (y,)
d  dave  (dave,)  (y,)
e   sed   (sed,)  (y,)
f  mike  (mike,)  (y,)
>>>

The only problem it brings is - the parenthesis and comma come with the data.它带来的唯一问题是 - 括号和逗号与数据一起出现。 I intend to get rid of that in the code outside of the function.我打算在 function 之外的代码中删除它。 Perhaps this或许这

>>> df['B'] = df['B'].apply(lambda x: re.sub(r"[^a-zA-Z0-9-]+", ' ', str(x)))
>>> df
      A       B     C
a  adam   adam   (y,)
b    ed     ed   (y,)
c   dra    dra   (y,)
d  dave   dave   (y,)
e   sed    sed   (y,)
f  mike   mike   (y,)
>>> df['C'] = df['C'].apply(lambda x: re.sub(r"[^a-zA-Z0-9-]+", ' ', str(x)))
>>> df
      A       B    C
a  adam   adam    y 
b    ed     ed    y 
c   dra    dra    y 
d  dave   dave    y 
e   sed    sed    y 
f  mike   mike    y

Answer 3

This will work:这将起作用：

df = pd.DataFrame(['adam', 'ed', 'dra','dave','sed','mike'], index =['a', 'b', 'c', 'd', 'e', 'f'], columns=['A'])
def get_item(data):
    comb=pd.DataFrame()
    comb['Newfield'] = data     #create new columns
    comb['AnotherNewfield'] = 'y'
    return comb
new_df = get_item(df)

如何正确调用 function 并返回更新的 dataframe？

问题描述

3 个解决方案

解决方案1
1 已采纳 2021-09-03 12:49:43

解决方案2
0 2021-09-03 11:40:16

解决方案3
0 2022-09-21 19:50:22

如何正确调用 function 并返回更新的 dataframe？

问题描述

3 个解决方案

解决方案1 1 已采纳 2021-09-03 12:49:43

解决方案2 0 2021-09-03 11:40:16

解决方案3 0 2022-09-21 19:50:22

解决方案1
1 已采纳 2021-09-03 12:49:43

解决方案2
0 2021-09-03 11:40:16

解决方案3
0 2022-09-21 19:50:22