[英]How do I properly call a function and return an updated dataframe?
I am trying to process and update rows in a dataframe through a function, and return the dataframe to finish using it.我正在尝试通过 function 处理和更新 dataframe 中的行,并返回 dataframe 以完成使用它。 When I try to return the dataframe to the original function call, it returns a series and not the expected column updates.
当我尝试将 dataframe 返回到原始 function 调用时,它返回一系列而不是预期的列更新。 A simple example is below:
一个简单的例子如下:
df = pd.DataFrame(['adam', 'ed', 'dra','dave','sed','mike'], index =
['a', 'b', 'c', 'd', 'e', 'f'], columns=['A'])
def get_item(data):
comb=pd.DataFrame()
comb['Newfield'] = data #create new columns
comb['AnotherNewfield'] = 'y'
return pd.DataFrame(comb)
Caling a function using apply:使用 apply 校准 function:
>>> newdf = df['A'].apply(get_item)
>>> newdf
a A Newfield AnotherNewfield
a adam st...
b A Newfield AnotherNewfield
e sed st...
c A Newfield AnotherNewfield
d dave st...
d A Newfield AnotherNewfield
d dave st...
e A Newfield AnotherNewfield
s NaN st...
f A Newfield AnotherNewfield
m NaN str(...
Name: A, dtype: object
>>> type(newdf)
<class 'pandas.core.series.Series'>
I assume that apply() is bad here, but am not quite sure how I 'should' be updating this dataframe via function otherwise.我认为 apply() 在这里不好,但我不太确定我应该如何通过 function 更新这个 dataframe 。
Edit: I appologize but i seems I accidentally deleted the sample function on an edit.编辑:我很抱歉,但我似乎在编辑时不小心删除了示例 function。 added it back here as I attempt a few other things I found in other posts.
当我尝试在其他帖子中找到的其他一些东西时,将其添加回此处。
Testing in a slightly different manner with individual variables - and returning multiple series variables -> seems to work so I will see if this is something I can do in my actual case and update.以稍微不同的方式对单个变量进行测试 - 并返回多个系列变量 -> 似乎有效,所以我会看看这是否是我在实际案例中可以做的事情并进行更新。
def get_item(data):
value = data #create new columns
AnotherNewfield = 'y'
return pd.Series(value),pd.Series(AnotherNewfield)
df['B'], df['C'] = zip(*df['A'].apply(get_item))
You could use groupby
with apply
to get dataframe from apply
call, like this:您可以使用
groupby
和apply
从apply
调用中获取 dataframe ,如下所示:
import pandas as pd
# add new column B for groupby - we need single group only to do the trick
df = pd.DataFrame(
{'A':['adam', 'ed', 'dra','dave','sed','mike'], 'B': [1,1,1,1,1,1]},
index=['a', 'b', 'c', 'd', 'e', 'f'])
def get_item(data):
# create empty dataframe to be returned
comb=pd.DataFrame(columns=['Newfield', 'AnotherNewfield'], data=None)
# append series data (or any data) to dataframe's columns
comb['Newfield'] = comb['Newfield'].append(data['A'], ignore_index=True)
comb['AnotherNewfield'] = 'y'
# return complete dataframe
return comb
# use column B for group to get tuple instead of dataframe
newdf = df.groupby('B').apply(get_item)
# after processing the dataframe newdf contains MultiIndex - simply remove the 0-level (index col B with value 1 gained from groupby operation)
newdf.droplevel(0)
Output: Output:
Newfield AnotherNewfield
0 adam y
1 ed y
2 dra y
3 dave y
4 sed y
5 mike y
For anyone looking for a potential answer to this, I got the desired result when executing this code I found in another post.对于任何寻找潜在答案的人,我在执行我在另一篇文章中找到的这段代码时得到了想要的结果。 Will post that guy's name to credit him, but this essentially allowed me to edit the function and get the data that was created in the different columns via the apply function:
会张贴那个人的名字以表扬他,但这基本上允许我编辑 function 并通过应用 function 获取在不同列中创建的数据:
def get_item(data):
value = data #create new columns using variables
AnotherNewfield = 'y'
return pd.Series(value),pd.Series(AnotherNewfield)
>>> df['B'], df['C'] = zip(*df['A'].apply(get_item))
>>> df
A B C
a adam (adam,) (y,)
b ed (ed,) (y,)
c dra (dra,) (y,)
d dave (dave,) (y,)
e sed (sed,) (y,)
f mike (mike,) (y,)
>>>
The only problem it brings is - the parenthesis and comma come with the data.它带来的唯一问题是 - 括号和逗号与数据一起出现。 I intend to get rid of that in the code outside of the function.
我打算在 function 之外的代码中删除它。 Perhaps this
或许这
>>> df['B'] = df['B'].apply(lambda x: re.sub(r"[^a-zA-Z0-9-]+", ' ', str(x)))
>>> df
A B C
a adam adam (y,)
b ed ed (y,)
c dra dra (y,)
d dave dave (y,)
e sed sed (y,)
f mike mike (y,)
>>> df['C'] = df['C'].apply(lambda x: re.sub(r"[^a-zA-Z0-9-]+", ' ', str(x)))
>>> df
A B C
a adam adam y
b ed ed y
c dra dra y
d dave dave y
e sed sed y
f mike mike y
This will work:这将起作用:
df = pd.DataFrame(['adam', 'ed', 'dra','dave','sed','mike'], index =['a', 'b', 'c', 'd', 'e', 'f'], columns=['A'])
def get_item(data):
comb=pd.DataFrame()
comb['Newfield'] = data #create new columns
comb['AnotherNewfield'] = 'y'
return comb
new_df = get_item(df)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.