How to replace the first n elements in each row of a dataframe that are larger than a certain threshold

Question

I have a huge dataframe that contains only numbers (the one I show below is just for demonstration purposes). My goal is to replace in each row of the dataframe the first n numbers that are larger than a certain value val by 0.

To give an example:

My dataframe could look like this:

   c1  c2  c3  c4
0  38  10   1   8
1  44  12  17  46
2  13   6   2   7
3   9  16  13  26

If I now choose n = 2 (number of replacements) and val = 10 , my desired output would look like this:

   c1  c2  c3  c4
0   0  10   1   8
1   0   0  17  46
2   0   6   2   7
3   9   0   0  26

In the first row, only one value is larger than val so only one gets replaced, in the second row all values are larger than val but only the first two can be replaced. Analog for rows 3 and 4 (please note that not only the first two columns are affected but the first two values in a row which can be in any column).

A straightforward and very ugly implementation could look like this:

import numpy as np
import pandas as pd

np.random.seed(1)

col1 = [np.random.randint(1, 50) for ti in xrange(4)]
col2 = [np.random.randint(1, 50) for ti in xrange(4)]
col3 = [np.random.randint(1, 50) for ti in xrange(4)]
col4 = [np.random.randint(1, 50) for ti in xrange(4)]

df = pd.DataFrame({'c1': col1, 'c2': col2, 'c3': col3, 'c4': col4})

val = 10
n = 2

for ind, row in df.iterrows():
    # number of replacements
    re = 0

    for indi, vali in enumerate(row):
        if vali > val:
            df.iloc[ind, indi] = 0
            re += 1
            if re == n:
                break

That works but I am sure that there are much more efficient ways of doing this. Any ideas?

Answer 1

You could write your own a bit weird function and use apply with axis=1 :

def f(x, n, m):
    y = x.copy()
    y[y[y > m].iloc[:n].index] = 0
    return y

In [380]: df
Out[380]:
   c1  c2  c3  c4
0  38  10   1   8
1  44  12  17  46
2  13   6   2   7
3   9  16  13  26

In [381]: df.apply(f, axis=1, n=2, m=10)
Out[381]:
   c1  c2  c3  c4
0   0  10   1   8
1   0   0  17  46
2   0   6   2   7
3   9   0   0  26

Note : y = x.copy() needs to make a copy of the series. If you need to change your values inplace you could omit that line. You need extra y because with slicing you'll get a copy not the original object.

How to replace the first n elements in each row of a dataframe that are larger than a certain threshold

Question

1 answers

solution1
2 ACCPTED 2016-01-26 14:03:22

How to replace the first n elements in each row of a dataframe that are larger than a certain threshold

Question

1 answers

solution1 2 ACCPTED 2016-01-26 14:03:22

solution1
2 ACCPTED 2016-01-26 14:03:22