简体   繁体   中英

How to create a function and apply for each row in pandas?

I have a somewhat-complex function that I am having difficulty writing. Essentially, I have a df that stores medical records and I need to identify the first site that a person goes to after their discharge date (I wish it was simple as choosing the first location after the initial stay, but it's not). The df is grouped by ID .

There are 3 options: (1) within a group, if any of the rows have a begin_date that matches the first rows end_date , return that location as the first site (if there are two rows that meet this condition, either are correct). (2) if the first option does not exist, then if there is an instance that the patient had location 'Health', then return 'Health'. (3) else, if conditions 1 and 2 do not exist, then return 'Home'

df

ID    color  begin_date    end_date     location
1     red    2017-01-01    2017-01-07   initial
1     green  2017-01-05    2017-01-07   nursing
1     blue   2017-01-07    2017-01-15   rehab
1     red    2017-01-11    2017-01-22   Health
2     red    2017-02-22    2017-02-26   initial
2     green  2017-02-26    2017-02-28   nursing
2     blue   2017-02-26    2017-02-28   rehab
3     red    2017-03-11    2017-03-22   initial
4     red    2017-04-01    2017-04-07   initial
4     green  2017-04-05    2017-04-07   nursing
4     blue   2017-04-10    2017-04-15   Health

finial result I am appending to a different df:

ID    first_site
1     rehab
2     nursing
3     home
4     Health

My approach is to write a function with these conditions, then use apply() to iterate over each row.

def conditions(x):
    if x['begin_date'].isin(x['end_date'].iloc[[0]]).any():
        return x['location'] 
    elif df[df['Health']] == True:
        return 'Health'
    else:
        return 'Home'

final = pd.DateFrame()
final['first'] = df.groupby('ID').apply(lambda x: conditions(x))

I am getting an error:

TypeError: incompatible index of inserted column with frame index

I think need:

def conditions(x):
    #compare each group first
    val = x.loc[x['begin_date'] == x['end_date'].iloc[0], 'location']
    #if at least one match (not return empty `Series` get first value)
    if not val.empty:
        return val.iloc[0]
    #check if value Health
    elif (x['location']  == 'Health').any():
        return 'Health'
    else:
        return 'Home'

final = df.groupby('ID').apply(conditions).reset_index(name='first_site')
print (final)
   ID first_site
0   1      rehab
1   2    nursing
2   3       Home
3   4     Health

If need new column remove reset_index and add map or use solution from comment, thank you @Oriol Mirosa:

final = df.groupby('ID').apply(conditions)
df['first_site'] = df['ID'].map(final)
print (df)
    ID  color begin_date   end_date location first_site
0    1    red 2017-01-01 2017-01-07  initial      rehab
1    1  green 2017-01-05 2017-01-07  nursing      rehab
2    1   blue 2017-01-07 2017-01-15    rehab      rehab
3    1    red 2017-01-11 2017-01-22   Health      rehab
4    2    red 2017-02-22 2017-02-26  initial    nursing
5    2  green 2017-02-26 2017-02-28  nursing    nursing
6    2   blue 2017-02-26 2017-02-28    rehab    nursing
7    3    red 2017-03-11 2017-03-22  initial       Home
8    4    red 2017-04-01 2017-04-07  initial     Health
9    4  green 2017-04-05 2017-04-07  nursing     Health
10   4   blue 2017-04-10 2017-04-15   Health     Health

Apply obviously is slow, if performance is important use:

#first filter by end date for each group
end = df.groupby('ID')['end_date'].transform('first')
df1 = df[(df['begin_date'] == end)]

#filter Health rows
df2 = df[(df['location'] == 'Health')]
#get filtered df together and remove duplicates, last reindex by all ID
#values for append missing ID rows 
df3 = (pd.concat([df1, df2])
        .drop_duplicates('ID')
        .set_index('ID')['location']
        .reindex(df['ID'].unique(), fill_value='Home')
        .reset_index(name='first_site'))
print (df3)
   ID first_site
0   1      rehab
1   2    nursing
2   3       Home
3   4     Health

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM