Find first value in dataframe's columns greater than another

Question

I have been looking for the most efficient way to find the first value in all columns of a pandas df, from left to right ( 0,1,2,3 ), for each row, that is greater than another column ( t ), and put the corresponding column label in a new column ( val ). If no column value is greater, then I want 0 instead.

For some reason I could not find anything simple and efficient (as the real table is really big).

Eg:

Initial table:

     t    0    1    2    3
JAN  3  1.9  2.1  2.6  2.9
FEB  6  2.0  4.0  5.0  9.0
MAR  2  1.0  3.0  4.0  4.0
APR  4  1.5  5.0  6.0  2.0

Final table:

     t    0    1    2    3   val
JAN  3  1.9  2.1  2.6  2.9   0
FEB  6  2.0  4.0  5.0  9.0   3
MAR  2  1.0  3.0  4.0  4.0   1
APR  4  1.5  3.0  6.0  2.0   2

Thanks!

Answer 1

You actually have a small difference in the base matrix between your initial and final tables, which took me a while to note. The value in row APR and column 1 was changed from 5.0 to 3.0. I'm going to use the latter matrix here.

from io import StringIO
import numpy as np
import pandas as pd

s = """
     t    0    1    2    3
JAN  3  1.9  2.1  2.6  2.9
FEB  6  2.0  4.0  5.0  9.0
MAR  2  1.0  3.0  4.0  4.0
APR  4  1.5  3.0  6.0  2.0
"""

# Read in the string
df = pd.read_csv(StringIO(s), delim_whitespace=True)

# Find all columns greater than your threshold column
s = np.where(df.gt(df['t'],0), ['', 0, 1, 2, 3], np.nan)

# Convert to dataframe, find the first instance, fill the rest with zeros and make a new column
df['vals'] = pd.DataFrame(s).min(axis=1).fillna(0).astype(int).values

# Which yields your expected result
print(df)
#     t    0    1    2    3  vals
#JAN  3  1.9  2.1  2.6  2.9     0
#FEB  6  2.0  4.0  5.0  9.0     3
#MAR  2  1.0  3.0  4.0  4.0     1
#APR  4  1.5  3.0  6.0  2.0     2

I based this off the answer to a similar problem here , which suggested that this technique is faster than a few other options.

Find first value in dataframe's columns greater than another

Question

1 answers

solution1
2 ACCPTED 2018-06-22 03:00:29

Find first value in dataframe's columns greater than another

Question

1 answers

solution1 2 ACCPTED 2018-06-22 03:00:29

solution1
2 ACCPTED 2018-06-22 03:00:29