简体   繁体   English

如何创建一个函数并申请pandas中的每一行?

[英]How to create a function and apply for each row in pandas?

I have a somewhat-complex function that I am having difficulty writing. 我有一个复杂的功能,我写作有困难。 Essentially, I have a df that stores medical records and I need to identify the first site that a person goes to after their discharge date (I wish it was simple as choosing the first location after the initial stay, but it's not). 基本上,我有一个df存储医疗记录,我需要确定一个人在出院日期之后去的第一个网站(我希望在初次入住后选择第一个位置很简单,但事实并非如此)。 The df is grouped by ID . df按ID分组。

There are 3 options: (1) within a group, if any of the rows have a begin_date that matches the first rows end_date , return that location as the first site (if there are two rows that meet this condition, either are correct). 有3个选项:(1)在一个组中,如果任何行的begin_date与第一行end_date匹配,则返回该位置作为第一个站点(如果有两行符合此条件,则两者都是正确的)。 (2) if the first option does not exist, then if there is an instance that the patient had location 'Health', then return 'Health'. (2)如果第一个选项不存在,那么如果存在患者location “健康”的实例,则返回“健康”。 (3) else, if conditions 1 and 2 do not exist, then return 'Home' (3)否则,如果条件1和2不存在,则返回'Home'

df DF

ID    color  begin_date    end_date     location
1     red    2017-01-01    2017-01-07   initial
1     green  2017-01-05    2017-01-07   nursing
1     blue   2017-01-07    2017-01-15   rehab
1     red    2017-01-11    2017-01-22   Health
2     red    2017-02-22    2017-02-26   initial
2     green  2017-02-26    2017-02-28   nursing
2     blue   2017-02-26    2017-02-28   rehab
3     red    2017-03-11    2017-03-22   initial
4     red    2017-04-01    2017-04-07   initial
4     green  2017-04-05    2017-04-07   nursing
4     blue   2017-04-10    2017-04-15   Health

finial result I am appending to a different df: 最终结果我附加到另一个df:

ID    first_site
1     rehab
2     nursing
3     home
4     Health

My approach is to write a function with these conditions, then use apply() to iterate over each row. 我的方法是使用这些条件编写函数,然后使用apply()迭代每一行。

def conditions(x):
    if x['begin_date'].isin(x['end_date'].iloc[[0]]).any():
        return x['location'] 
    elif df[df['Health']] == True:
        return 'Health'
    else:
        return 'Home'

final = pd.DateFrame()
final['first'] = df.groupby('ID').apply(lambda x: conditions(x))

I am getting an error: 我收到一个错误:

TypeError: incompatible index of inserted column with frame index

I think need: 我认为需要:

def conditions(x):
    #compare each group first
    val = x.loc[x['begin_date'] == x['end_date'].iloc[0], 'location']
    #if at least one match (not return empty `Series` get first value)
    if not val.empty:
        return val.iloc[0]
    #check if value Health
    elif (x['location']  == 'Health').any():
        return 'Health'
    else:
        return 'Home'

final = df.groupby('ID').apply(conditions).reset_index(name='first_site')
print (final)
   ID first_site
0   1      rehab
1   2    nursing
2   3       Home
3   4     Health

If need new column remove reset_index and add map or use solution from comment, thank you @Oriol Mirosa: 如果需要新列删除reset_index并添加map或使用评论中的解决方案,谢谢@Oriol Mirosa:

final = df.groupby('ID').apply(conditions)
df['first_site'] = df['ID'].map(final)
print (df)
    ID  color begin_date   end_date location first_site
0    1    red 2017-01-01 2017-01-07  initial      rehab
1    1  green 2017-01-05 2017-01-07  nursing      rehab
2    1   blue 2017-01-07 2017-01-15    rehab      rehab
3    1    red 2017-01-11 2017-01-22   Health      rehab
4    2    red 2017-02-22 2017-02-26  initial    nursing
5    2  green 2017-02-26 2017-02-28  nursing    nursing
6    2   blue 2017-02-26 2017-02-28    rehab    nursing
7    3    red 2017-03-11 2017-03-22  initial       Home
8    4    red 2017-04-01 2017-04-07  initial     Health
9    4  green 2017-04-05 2017-04-07  nursing     Health
10   4   blue 2017-04-10 2017-04-15   Health     Health

Apply obviously is slow, if performance is important use: 显然Apply很慢,如果性能很重要使用:

#first filter by end date for each group
end = df.groupby('ID')['end_date'].transform('first')
df1 = df[(df['begin_date'] == end)]

#filter Health rows
df2 = df[(df['location'] == 'Health')]
#get filtered df together and remove duplicates, last reindex by all ID
#values for append missing ID rows 
df3 = (pd.concat([df1, df2])
        .drop_duplicates('ID')
        .set_index('ID')['location']
        .reindex(df['ID'].unique(), fill_value='Home')
        .reset_index(name='first_site'))
print (df3)
   ID first_site
0   1      rehab
1   2    nursing
2   3       Home
3   4     Health

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将函数应用于pandas数据帧的每一行以创建两个新列 - Apply function to each row of pandas dataframe to create two new columns 如何将自定义函数应用于每行的pandas数据框 - How to apply custom function to pandas data frame for each row 如何将 function 应用于 pandas dataframe 中一列的每一行? - How to apply a function to each row of one column in a pandas dataframe? 如何将 function 应用于 pandas dataframe 中的每一行? - How can I apply a function to each row in a pandas dataframe? 按组将函数应用于 Pandas 数据框中的每一行 - Apply function to each row in Pandas dataframe by group In Pandas, how do I apply a function to a row of a dataframe, where each item in the row should be passed to the function as an argument? - In Pandas, how do I apply a function to a row of a dataframe, where each item in the row should be passed to the function as an argument? 为 apply function 中处理的每一行创建一个计数器 - Create a counter for each row processed in apply function 熊猫:如何对每一行应用转换? - Pandas: how to apply a transformation to each row? Apply zero-shot transformer model to each row and create new column(s) in pandas for the appropriate label (custom function and.apply) - Apply zero-shot transformer model to each row and create new column(s) in pandas for the appropriate label (custom function and .apply) 用于将函数应用于 Pandas DataFrame 中的每一行的应用函数的替代方法 - Alternative to apply function for applying a function to each row in Pandas DataFrame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM