I have a huge dataframe that contains only numbers (the one I show below is just for demonstration purposes). My goal is to replace in each row of the dataframe the first n
numbers that are larger than a certain value val
by 0.
To give an example:
My dataframe could look like this:
c1 c2 c3 c4
0 38 10 1 8
1 44 12 17 46
2 13 6 2 7
3 9 16 13 26
If I now choose n = 2
(number of replacements) and val = 10
, my desired output would look like this:
c1 c2 c3 c4
0 0 10 1 8
1 0 0 17 46
2 0 6 2 7
3 9 0 0 26
In the first row, only one value is larger than val
so only one gets replaced, in the second row all values are larger than val
but only the first two can be replaced. Analog for rows 3 and 4 (please note that not only the first two columns are affected but the first two values in a row which can be in any column).
A straightforward and very ugly implementation could look like this:
import numpy as np
import pandas as pd
np.random.seed(1)
col1 = [np.random.randint(1, 50) for ti in xrange(4)]
col2 = [np.random.randint(1, 50) for ti in xrange(4)]
col3 = [np.random.randint(1, 50) for ti in xrange(4)]
col4 = [np.random.randint(1, 50) for ti in xrange(4)]
df = pd.DataFrame({'c1': col1, 'c2': col2, 'c3': col3, 'c4': col4})
val = 10
n = 2
for ind, row in df.iterrows():
# number of replacements
re = 0
for indi, vali in enumerate(row):
if vali > val:
df.iloc[ind, indi] = 0
re += 1
if re == n:
break
That works but I am sure that there are much more efficient ways of doing this. Any ideas?
You could write your own a bit weird function and use apply
with axis=1
:
def f(x, n, m):
y = x.copy()
y[y[y > m].iloc[:n].index] = 0
return y
In [380]: df
Out[380]:
c1 c2 c3 c4
0 38 10 1 8
1 44 12 17 46
2 13 6 2 7
3 9 16 13 26
In [381]: df.apply(f, axis=1, n=2, m=10)
Out[381]:
c1 c2 c3 c4
0 0 10 1 8
1 0 0 17 46
2 0 6 2 7
3 9 0 0 26
Note : y = x.copy()
needs to make a copy of the series. If you need to change your values inplace you could omit that line. You need extra y
because with slicing you'll get a copy not the original object.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.