I have a data frame like this,
df1
col1 col2
1 A
2 A
3 A
4 B
5 A
6 A
7 B
8 A
9 A
10 A
11 C
12 C
13 A
14 A
15 C
16 A
17 C
In above data frame total number of B and C are always even. Now I want to fill all the values between two B and C with B and C.
So the final data frame should look like,
df1
col1 col2
1 A
2 A
3 A
4 B
5 B
6 B
7 B
8 A
9 A
10 A
11 C
12 C
13 A
14 A
15 C
16 C
17 C
I could do it using a for loop, but the execution time will be huge, I am looking for some pandas shortcut / pythonic way to do it.
Idea is filter out consecutive B
or C
values, then replace all another B
or C
to missing values. Then forward filling missing values but keep only values same like backfilling, last replace all another values to original with Series.fillna
:
for v in ['B','C']:
m1 = df['col2'].eq(v)
m2 = m1.ne(m1.shift()).cumsum().duplicated(keep=False)
s = df['col2'].where(m1 & ~m2)
ff = s.ffill()
df['col2'] = ff.where(ff == s.bfill()).fillna(df['col2'])
print (df)
col1 col2
0 1 A
1 2 A
2 3 A
3 4 B
4 5 B
5 6 B
6 7 B
7 8 A
8 9 A
9 10 A
10 11 C
11 12 C
12 13 A
13 14 A
14 15 C
15 16 C
16 17 C
You only need to select when the cumulative sum Series.cumsum
is odd + Series.mask
:
for l in ['B','C']:
mask=(df.col2.eq(l).cumsum()%2)==1
df['col2']=df['col2'].mask(mask,l)
print(df)
col1 col2
0 1 A
1 2 A
2 3 A
3 4 B
4 5 B
5 6 B
6 7 B
7 8 A
8 9 A
9 10 A
10 11 C
11 12 C
12 13 A
13 14 A
14 15 C
15 16 C
16 17 C
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.