简体   繁体   中英

python pandas dataframe: operate to obtain value in cells of each row conditioned to cell value of another row

I have the following Pandas Dataframe.

name    day h1  h2  h3  h4  h5

pepe    1   10  4   0   4   7
pepe    2   54  65  4   42  6
pepe    3   1   3   28  6   12
pepe    4   5   6   1   8   5
juan    1   78  9   2   65  4
juan    2   2   42  14  54  95

I want to obtain:

name    day h1  h2  h3  h4  h5  sum

pepe    1   10  4   0   4   7   
pepe    2   54  65  4   42  6   18
pepe    3   1   3   28  6   12  165
pepe    4   5   6   1   8   5   38
juan    1   78  9   2   65  4   
juan    2   2   42  14  54  95  154

I've been searching the web, but without success.

The number 38 of the sum column is in the pepe row, day 4, and is the sum of h1 to h4 of the pepe row of the day 4-1 = 3. Similarly, it proceeds for day 3 and day 2. On day 1 you must keep an empty result in your corresponding sum cell.

The same must be done for Juan and so for the different values ​​of name.

How can I do it?. Maybe it's better to try to make a loop using iterrows first or something like that.

I would sum the rows based on the values... This is my favorite resource for complex loc calls, lots of options here -- https://www.shanelynn.ie/select-pandas-dataframe-rows-and-columns-using-iloc-loc-and-ix/

df.reset_index(inplace=True)
df.loc[df['name'] == 'pepe','sum'] = df.sum(axis=1)

or

df.reset_index(inplace=True)
df.groupby('name')['h1','h2','h3','h4'].sum(axis=1)

to use loop, would need df.itertuples()

df['sum'] = 0 #Must initialize column first
for i in df.itertuples():
    temp_sum = i.h1 + i.h2 + i.h3 + i.h4
    #May need to check if final row of 'name', or groupby name first.
    df.at[i,'sum'] = temp_sum

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM