How to efficiently select a rows from pandas DataFrame?

Question

The following table contains some keys and values:

N = 100
tbl = pd.DataFrame({'key':np.random.randint(0, 10, N), 
    'y':np.random.rand(N), 'z':np.random.rand(N)})

I would like to obtain a DataFrame in which each row contains a key and all the fields that correspond to the minimal value of a specified field.

Since the original table is very large, I'm interested in the most efficient way.

NOTE getting the minimal value of a field is simple:

tbl.groupby('key').agg(pd.Series.min)

But this takes the minimum values of every field, independently, I would like to know what is the minimum value of y and what z value corresponds to it.

Below I post an answer to my question with my naive approach, but I suspect there are better ways

Answer 1

Here is a straightforward approach:

gr = tbl.groupby('key')
def take_min_y(t):
    ix = t.y.argmin()
    return t.loc[[ix]]

tbl_mins = gr.apply(take_min_y)

Is there a better way?

Answer 2

Based on your updated edit I believe the following is what you want:

In [107]:

tbl.iloc[gr['y'].agg(pd.Series.idxmin)]
Out[107]:
    key         y         z
47    0  0.094841  0.221435
26    1  0.062200  0.748082
45    2  0.032497  0.160199
28    3  0.002242  0.064829
73    4  0.122438  0.723844
75    5  0.128193  0.638933
79    6  0.071833  0.952624
86    7  0.058974  0.113317
36    8  0.068757  0.611111
12    9  0.082604  0.271268

idxmin returns the index of the min value, we can then use this to filter the original dataframe to select these rows.

Timings show this method is approx 7 times faster:

In [108]:

%timeit tbl.iloc[gr['y'].agg(pd.Series.idxmin)]
def take_min_y(t):
    ix = t.y.argmin()
    return t.loc[[ix]]

%timeit tbl_mins = gr.apply(take_min_y)
1000 loops, best of 3: 1.08 ms per loop
100 loops, best of 3: 7.06 ms per loop

How to efficiently select a rows from pandas DataFrame?

Question

2 answers

solution1
1 2014-07-22 08:57:48

solution2
1 2014-07-22 10:15:10

How to efficiently select a rows from pandas DataFrame?

Question

2 answers

solution1 1 2014-07-22 08:57:48

solution2 1 2014-07-22 10:15:10

solution1
1 2014-07-22 08:57:48

solution2
1 2014-07-22 10:15:10