简体   繁体   中英

Pandas df get previous row value

I have a pandas dataframe with null values:

index fecha code Place dato1 porcentaje_dato1 dato2 dato3 porcentaje_dato3
0 2021-01-04 1 Place1 25809 0.3 NaN NaN 0.0
1 2021-01-04 2 Place2 2004 0.15 NaN NaN 0.0
2 2021-01-04 3 Place3 9380 0.92 NaN NaN 0.0
3 2021-01-04 4 Place4 153 0.01 NaN NaN 0.0
20 2021-01-05 1 Place1 40263 0.47 NaN NaN 0.0
21 2021-01-05 2 Place2 2985 0.22 NaN NaN 0.0
22 2021-01-05 3 Place3 12929 1.27 NaN NaN 0.0
23 2021-01-05 4 Place4 2656 0.22 NaN NaN 0.0
40 2021-01-07 1 Place1 53934 0.64 NaN NaN 0.0
41 2021-01-07 2 Place2 6186 0.46 NaN NaN 0.0
42 2021-01-07 3 Place3 14406 1.42 NaN NaN 0.0
43 2021-01-07 4 Place4 3190 0.26 NaN NaN 0.0
1415 2021-04-14 1 Place1 1970183 23.23 1419209.0 550974.0 6.5
1416 2021-04-14 2 Place2 331419 24.89 228547.0 102872.0 7.73
1417 2021-04-14 3 Place3 317019 31.22 216006.0 101013.0 9.95
1418 2021-04-14 4 Place4 233042 19.18 175460.0 57582.0 4.74
1436 2021-04-15 1 Place1 2041844 24.07 1481837.0 560007.0 6.6
1437 2021-04-15 2 Place2 347963 26.14 243497.0 104466.0 7.85
1438 2021-04-15 3 Place3 330038 32.5 225213.0 104825.0 10.32
1439 2021-04-15 4 Place4 240488 19.79 180775.0 59713.0 4.91

If value of dato2 is null, I need to fill it with dato1 value and sum previous day value for same place. Steps to implement are

  • first order by place and date
  • iterate dataframe. For each row
    • Check if it is first row of entire df. If so, dato2 = dato1
    • check if place has change (if place of actual row is different than place of previous row). Then dato2 = dato1
    • else: dato2 = dato2 previous row + dato1 actual row

code I have is

df = df.sort_values(by=['place', 'fecha']) 
for i, row in df.iterrows():
  if pd.isnull(row['dato2']):
    if i == 0:
      df['dato2'][i] = df['dato1'][i]
    elif df['place'][i] != df['place'][i-1]:
      df['dato2'][i] = df['dato1'][i]
    else:
      df['dato2'][i] = df['dato2'][i-1] + df_vac['dato1'][i]
  else:
    df['dato2'][i]

But with this code indexes are not valid.

Here's my approach.

# Sort dataframe
df = (pd.read_csv(data)
        .sort_values(['Place','fecha']
        .reset_index())

# Fill missing values for dato2 with dato1
df['dato2'] = df.dato2.fillna(df.dato1)

# Calculate the aggregate, store in separate df
df_agg = (df[['Place','fecha','dato2']].groupby(['Place','fecha']).sum()
                                       .groupby('Place').cumsum()
                                       .reset_index())

# Update original data
df.update(df_agg)

Result:

index fecha code Place dato1 porcentaje_dato1 dato2 dato3 porcentaje_dato3
0 2021-01-04 1 Place1 25809 0.30 25809.0 NaN 0.00
4 2021-01-05 1 Place1 40263 0.47 66072.0 NaN 0.00
8 2021-01-07 1 Place1 53934 0.64 120006.0 NaN 0.00
12 2021-04-14 1 Place1 1970183 23.23 1539215.0 550974.0 6.50
16 2021-04-15 1 Place1 2041844 24.07 3021052.0 560007.0 6.60
1 2021-01-04 2 Place2 2004 0.15 2004.0 NaN 0.00
5 2021-01-05 2 Place2 2985 0.22 4989.0 NaN 0.00
9 2021-01-07 2 Place2 6186 0.46 11175.0 NaN 0.00
13 2021-04-14 2 Place2 331419 24.89 239722.0 102872.0 7.73
17 2021-04-15 2 Place2 347963 26.14 483219.0 104466.0 7.85
2 2021-01-04 3 Place3 9380 0.92 9380.0 NaN 0.00
6 2021-01-05 3 Place3 12929 1.27 22309.0 NaN 0.00
10 2021-01-07 3 Place3 14406 1.42 36715.0 NaN 0.00
14 2021-04-14 3 Place3 317019 31.22 252721.0 101013.0 9.95
18 2021-04-15 3 Place3 330038 32.50 477934.0 104825.0 10.32
3 2021-01-04 4 Place4 153 0.01 153.0 NaN 0.00
7 2021-01-05 4 Place4 2656 0.22 2809.0 NaN 0.00
11 2021-01-07 4 Place4 3190 0.26 5999.0 NaN 0.00
15 2021-04-14 4 Place4 233042 19.18 181459.0 57582.0 4.74
19 2021-04-15 4 Place4 240488 19.79 362234.0 59713.0 4.91

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM