I have a pandas dataframe with null values:
index | fecha | code | Place | dato1 | porcentaje_dato1 | dato2 | dato3 | porcentaje_dato3 |
---|---|---|---|---|---|---|---|---|
0 | 2021-01-04 | 1 | Place1 | 25809 | 0.3 | NaN | NaN | 0.0 |
1 | 2021-01-04 | 2 | Place2 | 2004 | 0.15 | NaN | NaN | 0.0 |
2 | 2021-01-04 | 3 | Place3 | 9380 | 0.92 | NaN | NaN | 0.0 |
3 | 2021-01-04 | 4 | Place4 | 153 | 0.01 | NaN | NaN | 0.0 |
20 | 2021-01-05 | 1 | Place1 | 40263 | 0.47 | NaN | NaN | 0.0 |
21 | 2021-01-05 | 2 | Place2 | 2985 | 0.22 | NaN | NaN | 0.0 |
22 | 2021-01-05 | 3 | Place3 | 12929 | 1.27 | NaN | NaN | 0.0 |
23 | 2021-01-05 | 4 | Place4 | 2656 | 0.22 | NaN | NaN | 0.0 |
40 | 2021-01-07 | 1 | Place1 | 53934 | 0.64 | NaN | NaN | 0.0 |
41 | 2021-01-07 | 2 | Place2 | 6186 | 0.46 | NaN | NaN | 0.0 |
42 | 2021-01-07 | 3 | Place3 | 14406 | 1.42 | NaN | NaN | 0.0 |
43 | 2021-01-07 | 4 | Place4 | 3190 | 0.26 | NaN | NaN | 0.0 |
1415 | 2021-04-14 | 1 | Place1 | 1970183 | 23.23 | 1419209.0 | 550974.0 | 6.5 |
1416 | 2021-04-14 | 2 | Place2 | 331419 | 24.89 | 228547.0 | 102872.0 | 7.73 |
1417 | 2021-04-14 | 3 | Place3 | 317019 | 31.22 | 216006.0 | 101013.0 | 9.95 |
1418 | 2021-04-14 | 4 | Place4 | 233042 | 19.18 | 175460.0 | 57582.0 | 4.74 |
1436 | 2021-04-15 | 1 | Place1 | 2041844 | 24.07 | 1481837.0 | 560007.0 | 6.6 |
1437 | 2021-04-15 | 2 | Place2 | 347963 | 26.14 | 243497.0 | 104466.0 | 7.85 |
1438 | 2021-04-15 | 3 | Place3 | 330038 | 32.5 | 225213.0 | 104825.0 | 10.32 |
1439 | 2021-04-15 | 4 | Place4 | 240488 | 19.79 | 180775.0 | 59713.0 | 4.91 |
If value of dato2 is null, I need to fill it with dato1 value and sum previous day value for same place. Steps to implement are
code I have is
df = df.sort_values(by=['place', 'fecha'])
for i, row in df.iterrows():
if pd.isnull(row['dato2']):
if i == 0:
df['dato2'][i] = df['dato1'][i]
elif df['place'][i] != df['place'][i-1]:
df['dato2'][i] = df['dato1'][i]
else:
df['dato2'][i] = df['dato2'][i-1] + df_vac['dato1'][i]
else:
df['dato2'][i]
But with this code indexes are not valid.
Here's my approach.
# Sort dataframe
df = (pd.read_csv(data)
.sort_values(['Place','fecha']
.reset_index())
# Fill missing values for dato2 with dato1
df['dato2'] = df.dato2.fillna(df.dato1)
# Calculate the aggregate, store in separate df
df_agg = (df[['Place','fecha','dato2']].groupby(['Place','fecha']).sum()
.groupby('Place').cumsum()
.reset_index())
# Update original data
df.update(df_agg)
Result:
index | fecha | code | Place | dato1 | porcentaje_dato1 | dato2 | dato3 | porcentaje_dato3 |
---|---|---|---|---|---|---|---|---|
0 | 2021-01-04 | 1 | Place1 | 25809 | 0.30 | 25809.0 | NaN | 0.00 |
4 | 2021-01-05 | 1 | Place1 | 40263 | 0.47 | 66072.0 | NaN | 0.00 |
8 | 2021-01-07 | 1 | Place1 | 53934 | 0.64 | 120006.0 | NaN | 0.00 |
12 | 2021-04-14 | 1 | Place1 | 1970183 | 23.23 | 1539215.0 | 550974.0 | 6.50 |
16 | 2021-04-15 | 1 | Place1 | 2041844 | 24.07 | 3021052.0 | 560007.0 | 6.60 |
1 | 2021-01-04 | 2 | Place2 | 2004 | 0.15 | 2004.0 | NaN | 0.00 |
5 | 2021-01-05 | 2 | Place2 | 2985 | 0.22 | 4989.0 | NaN | 0.00 |
9 | 2021-01-07 | 2 | Place2 | 6186 | 0.46 | 11175.0 | NaN | 0.00 |
13 | 2021-04-14 | 2 | Place2 | 331419 | 24.89 | 239722.0 | 102872.0 | 7.73 |
17 | 2021-04-15 | 2 | Place2 | 347963 | 26.14 | 483219.0 | 104466.0 | 7.85 |
2 | 2021-01-04 | 3 | Place3 | 9380 | 0.92 | 9380.0 | NaN | 0.00 |
6 | 2021-01-05 | 3 | Place3 | 12929 | 1.27 | 22309.0 | NaN | 0.00 |
10 | 2021-01-07 | 3 | Place3 | 14406 | 1.42 | 36715.0 | NaN | 0.00 |
14 | 2021-04-14 | 3 | Place3 | 317019 | 31.22 | 252721.0 | 101013.0 | 9.95 |
18 | 2021-04-15 | 3 | Place3 | 330038 | 32.50 | 477934.0 | 104825.0 | 10.32 |
3 | 2021-01-04 | 4 | Place4 | 153 | 0.01 | 153.0 | NaN | 0.00 |
7 | 2021-01-05 | 4 | Place4 | 2656 | 0.22 | 2809.0 | NaN | 0.00 |
11 | 2021-01-07 | 4 | Place4 | 3190 | 0.26 | 5999.0 | NaN | 0.00 |
15 | 2021-04-14 | 4 | Place4 | 233042 | 19.18 | 181459.0 | 57582.0 | 4.74 |
19 | 2021-04-15 | 4 | Place4 | 240488 | 19.79 | 362234.0 | 59713.0 | 4.91 |
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.