I have a pandas dataframe with column named as 'A_col', and I would like to create new column called 'A_col_fill', which will replace NaN in 'A_col' with a minimum value just prior to it if there is one. The sample output looks like below.
A_col A_col_fill
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
5 NaN NaN
6 NaN NaN
7 -0.3400 -0.3400
8 NaN -0.3400
9 NaN -0.3400
10 -0.1900 -0.1900
11 NaN -0.1900
12 -0.3700 -0.3700
13 -0.4100 -0.4100
14 -0.3300 -0.3300
15 NaN -0.4100
16 NaN -0.4100
17 NaN -0.4100
18 NaN -0.4100
19 NaN -0.4100
20 -1.6500 -1.6500
21 -1.8000 -1.8000
22 -1.5300 -1.5300
23 -1.3500 -1.3500
24 NaN -1.8000
25 -0.1900 -0.1900
26 -0.1400 -0.1400
28 -0.2100 -0.2100
Looks like Dataframe 'fillna' function don't work with case, How can I implement this, any code snippet are highly appreciated!
p['A_col'].fillna(np.inf).replace(np.inf,p['A_col'].ffill().cummin())
output:
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 -0.34
8 -0.34
9 -0.34
10 -0.19
11 -0.34
12 -0.37
13 -0.41
14 -0.33
15 -0.41
16 -0.41
17 -0.41
18 -0.41
19 -0.41
20 -1.65
21 -1.80
22 -1.53
23 -1.35
24 -1.80
25 -0.19
26 -0.14
28 -0.21
This solution will fillna via the following with the minimum value of the last "island" of contiguous rows that contain values. It should be more accurate and performant than other suggested solutions (at the expense of complication):
code:
df["group_col"] = np.cumsum(df["A_col"].isna() != df["A_col"].isna().shift())
df["group_min"] = df.groupby("group_col").A_col.transform(min).ffill()
df["output"] = df["A_col"].fillna(df.group_min)
result:
A_col A_col_fill group_col group_min output
0 NaN NaN 1 NaN NaN
1 NaN NaN 1 NaN NaN
2 NaN NaN 1 NaN NaN
3 NaN NaN 1 NaN NaN
4 NaN NaN 1 NaN NaN
5 NaN NaN 1 NaN NaN
6 NaN NaN 1 NaN NaN
7 -0.34 -0.34 2 -0.34 -0.34
8 NaN -0.34 3 -0.34 -0.34
9 NaN -0.34 3 -0.34 -0.34
10 -0.19 -0.19 4 -0.19 -0.19
11 NaN -0.19 5 -0.19 -0.19
12 -0.37 -0.37 6 -0.41 -0.37
13 -0.41 -0.41 6 -0.41 -0.41
14 -0.33 -0.33 6 -0.41 -0.33
15 NaN -0.41 7 -0.41 -0.41
16 NaN -0.41 7 -0.41 -0.41
17 NaN -0.41 7 -0.41 -0.41
18 NaN -0.41 7 -0.41 -0.41
19 NaN -0.41 7 -0.41 -0.41
20 -1.65 -1.65 8 -1.80 -1.65
21 -1.80 -1.80 8 -1.80 -1.80
22 -1.53 -1.53 8 -1.80 -1.53
23 -1.35 -1.35 8 -1.80 -1.35
24 NaN -1.80 9 -1.80 -1.80
25 -0.19 -0.19 10 -0.21 -0.19
26 -0.14 -0.14 10 -0.21 -0.14
28 -0.21 -0.21 10 -0.21 -0.21
The solution takes milliseconds for a 1M row df on my machine:
df = pd.DataFrame(np.random.random(size=100000), columns=["A_col"])
df.loc[df.sample(frac=0.6).index, "A_col"] = np.nan
# code from above
df["group_col"] = np.cumsum(df["A_col"].isna() != df["A_col"].isna().shift())
df["group_min"] = df.groupby("group_col").A_col.transform(min).ffill()
df["output"] = df["A_col"].fillna(df.group_min)
Simple solution, just iterate over the column and keep minimum all time and fill Nan value
def fill_min(df):
minx = np.inf
ans = []
for val in df['A_Col']:
if np.isnan(val):
ans.append(val if np.isinf(minx) else minx)
else:
minx = min(minx, val)
ans.append(val)
return ans
USE:
df['A_col_fill'] = fill_min(df)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.