Pandas df get previous row value

Question

I have a pandas dataframe with null values:

index	fecha	code	Place	dato1	porcentaje_dato1	dato2	dato3	porcentaje_dato3
0	2021-01-04	1	Place1	25809	0.3	NaN	NaN	0.0
1	2021-01-04	2	Place2	2004	0.15	NaN	NaN	0.0
2	2021-01-04	3	Place3	9380	0.92	NaN	NaN	0.0
3	2021-01-04	4	Place4	153	0.01	NaN	NaN	0.0
20	2021-01-05	1	Place1	40263	0.47	NaN	NaN	0.0
21	2021-01-05	2	Place2	2985	0.22	NaN	NaN	0.0
22	2021-01-05	3	Place3	12929	1.27	NaN	NaN	0.0
23	2021-01-05	4	Place4	2656	0.22	NaN	NaN	0.0
40	2021-01-07	1	Place1	53934	0.64	NaN	NaN	0.0
41	2021-01-07	2	Place2	6186	0.46	NaN	NaN	0.0
42	2021-01-07	3	Place3	14406	1.42	NaN	NaN	0.0
43	2021-01-07	4	Place4	3190	0.26	NaN	NaN	0.0
1415	2021-04-14	1	Place1	1970183	23.23	1419209.0	550974.0	6.5
1416	2021-04-14	2	Place2	331419	24.89	228547.0	102872.0	7.73
1417	2021-04-14	3	Place3	317019	31.22	216006.0	101013.0	9.95
1418	2021-04-14	4	Place4	233042	19.18	175460.0	57582.0	4.74
1436	2021-04-15	1	Place1	2041844	24.07	1481837.0	560007.0	6.6
1437	2021-04-15	2	Place2	347963	26.14	243497.0	104466.0	7.85
1438	2021-04-15	3	Place3	330038	32.5	225213.0	104825.0	10.32
1439	2021-04-15	4	Place4	240488	19.79	180775.0	59713.0	4.91

If value of dato2 is null, I need to fill it with dato1 value and sum previous day value for same place. Steps to implement are

first order by place and date
iterate dataframe. For each row
- Check if it is first row of entire df. If so, dato2 = dato1
- check if place has change (if place of actual row is different than place of previous row). Then dato2 = dato1
- else: dato2 = dato2 previous row + dato1 actual row

code I have is

df = df.sort_values(by=['place', 'fecha']) 
for i, row in df.iterrows():
  if pd.isnull(row['dato2']):
    if i == 0:
      df['dato2'][i] = df['dato1'][i]
    elif df['place'][i] != df['place'][i-1]:
      df['dato2'][i] = df['dato1'][i]
    else:
      df['dato2'][i] = df['dato2'][i-1] + df_vac['dato1'][i]
  else:
    df['dato2'][i]

But with this code indexes are not valid.

Answer 1

Here's my approach.

# Sort dataframe
df = (pd.read_csv(data)
        .sort_values(['Place','fecha']
        .reset_index())

# Fill missing values for dato2 with dato1
df['dato2'] = df.dato2.fillna(df.dato1)

# Calculate the aggregate, store in separate df
df_agg = (df[['Place','fecha','dato2']].groupby(['Place','fecha']).sum()
                                       .groupby('Place').cumsum()
                                       .reset_index())

# Update original data
df.update(df_agg)

Result:

index	fecha	code	Place	dato1	porcentaje_dato1	dato2	dato3	porcentaje_dato3
0	2021-01-04	1	Place1	25809	0.30	25809.0	NaN	0.00
4	2021-01-05	1	Place1	40263	0.47	66072.0	NaN	0.00
8	2021-01-07	1	Place1	53934	0.64	120006.0	NaN	0.00
12	2021-04-14	1	Place1	1970183	23.23	1539215.0	550974.0	6.50
16	2021-04-15	1	Place1	2041844	24.07	3021052.0	560007.0	6.60
1	2021-01-04	2	Place2	2004	0.15	2004.0	NaN	0.00
5	2021-01-05	2	Place2	2985	0.22	4989.0	NaN	0.00
9	2021-01-07	2	Place2	6186	0.46	11175.0	NaN	0.00
13	2021-04-14	2	Place2	331419	24.89	239722.0	102872.0	7.73
17	2021-04-15	2	Place2	347963	26.14	483219.0	104466.0	7.85
2	2021-01-04	3	Place3	9380	0.92	9380.0	NaN	0.00
6	2021-01-05	3	Place3	12929	1.27	22309.0	NaN	0.00
10	2021-01-07	3	Place3	14406	1.42	36715.0	NaN	0.00
14	2021-04-14	3	Place3	317019	31.22	252721.0	101013.0	9.95
18	2021-04-15	3	Place3	330038	32.50	477934.0	104825.0	10.32
3	2021-01-04	4	Place4	153	0.01	153.0	NaN	0.00
7	2021-01-05	4	Place4	2656	0.22	2809.0	NaN	0.00
11	2021-01-07	4	Place4	3190	0.26	5999.0	NaN	0.00
15	2021-04-14	4	Place4	233042	19.18	181459.0	57582.0	4.74
19	2021-04-15	4	Place4	240488	19.79	362234.0	59713.0	4.91

Pandas df get previous row value

Question

1 answers

solution1
0 2022-03-21 17:26:23

Pandas df get previous row value

Question

1 answers

solution1 0 2022-03-21 17:26:23

solution1
0 2022-03-21 17:26:23