This is particular case of question in header.
I have following dataframe:
values = [[100,54,25,26,32,33,15,2],[1,2,3,4,5,6,7,8]]
columns = ["numbers", "order"]
zipped = dict(zip(columns,values))
df = pd.DataFrame(zipped)
print(df)
numbers order
0 100 1
1 54 2
2 25 3
3 26 4
4 32 5
5 33 6
6 15 7
7 2 8
Imagine that dataframe ascendingly sorted by column order . In column numbers I want to replace values with NaN if there is a bigger value present down the rows, and achieve following result:
numbers order
0 100 1
1 54 2
2 NaN 3
3 NaN 4
4 NaN 5
5 33 6
6 15 7
7 2 8
What will be the best approach to achieve it without going through the loop?
Update: Probably better example for the initial DF and expected results (to add discontiguous blocks of values to be replaced):
values = [[100,54,25,26,34,32,31,33,15,2],[1,2,3,4,5,6,7,8,9,10]]
numbers order
0 100 1
1 54 2
2 25 3
3 26 4
4 34 5
5 32 6
6 31 7
7 33 8
8 15 9
9 2 10
Results:
numbers order
0 100.0 1
1 54.0 2
2 NaN 3
3 NaN 4
4 34.0 5
5 NaN 6
6 NaN 7
7 33.0 8
8 15.0 9
9 2.0 10
I read this slightly differently, if the numbers are bigger below that means their reversed cummax is higher:
In [11]: df.at[3, 'numbers'] = 24 # more illustrative example
In [12]: df.numbers[::-1].cummax()[::-1]
Out[12]:
0 100
1 54
2 33
3 33
4 33
5 33
6 15
7 2
Name: numbers, dtype: int64
In [13]: df.loc[df.numbers < df.numbers[::-1].cummax()[::-1], 'numbers'] = np.nan
In [14]: df
Out[14]:
numbers order
0 100.0 1
1 54.0 2
2 NaN 3
3 NaN 4
4 NaN 5
5 33.0 6
6 15.0 7
7 2.0 8
You can loop through the values of your columns and check if it's greater than all the elements that come after:
arr = df['numbers'].values
df['numbers'] = [x if all(x > arr[n+1:]) else np.nan for n, x in enumerate(arr)]
df
Output:
numbers order
0 100.0 1
1 54.0 2
2 NaN 3
3 NaN 4
4 NaN 5
5 33.0 6
6 15.0 7
7 2.0 8
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.