简体   繁体   中英

How do you iterate through groups in a pandas Dataframe, operate on each group, then assign values to the original dataframe?

    yearCount = df[['antibiotic', 'order_date', 'antiYearCount']]

    yearGroups = yearCount.groupby('order_date')

    for year in yearGroups:
        yearCount['antiYearCount'] =year.groupby('antibiotic'['antibiotic'].transform(pd.Series.value_counts)

In this case, yearCount is a dataframe containing 'order_date', 'antibiotic', 'antiYearCount' . I have cleaned 'order_date' to only contain the year of the order. I want to group yearCount by the years in 'order_date' , count the number of times each 'antibiotic' appears in each "year group" then assign that value to yearCount 's 'antiYearCount' variable.

I think you need add new column order_date to groupby and then is also possible use size instead pd.Series.value_counts for same output:

df = pd.DataFrame({'antibiotic':list('accbbb'),
                   'antiYearCount':[4,5,4,5,5,4],
                   'C':[7,8,9,4,2,3],
                   'D':[1,3,5,7,1,0],
                   'E':[5,3,6,9,2,4],
                   'order_date': pd.to_datetime(['2012-01-01']*3+['2012-01-02']*3)})

print (df)
   C  D  E  antiYearCount antibiotic order_date
0  7  1  5              4          a 2012-01-01
1  8  3  3              5          c 2012-01-01
2  9  5  6              4          c 2012-01-01
3  4  7  9              5          b 2012-01-02
4  2  1  2              5          b 2012-01-02
5  3  0  4              4          b 2012-01-02

#copy for remove warning
#https://stackoverflow.com/a/45035966/2901002
yearCount = df[['antibiotic', 'order_date', 'antiYearCount']].copy()
yearCount['antiYearCount'] = yearCount.groupby(['order_date','antibiotic'])['antibiotic'] \
                                      .transform('size')
print (yearCount)
  antibiotic order_date  antiYearCount
0          a 2012-01-01              1
1          c 2012-01-01              2
2          c 2012-01-01              2
3          b 2012-01-02              3
4          b 2012-01-02              3
5          b 2012-01-02              3

yearCount['antiYearCount'] = yearCount.groupby(['order_date','antibiotic'])['antibiotic'] \
                                      .transform(pd.Series.value_counts)
print (yearCount)
  antibiotic order_date  antiYearCount
0          a 2012-01-01              1
1          c 2012-01-01              2
2          c 2012-01-01              2
3          b 2012-01-02              3
4          b 2012-01-02              3
5          b 2012-01-02              3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM