简体   繁体   中英

How to subset pandas dataframe columns with idxmax output?

I have a pandas dataframe:

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0,40,size=(10,4)), columns=range(4), index = range(10))
df.head()

    0   1   2   3
0  27  10  13  21
1  25  12  23   8
2   2  24  24  34
3  10  11  11  10
4   0  15   0  27

I'm using the idxmax function to get the columns that contain the maximum value.

df_max = df.idxmax(1)
df_max.head()

0    0
1    0
2    3
3    1
4    3

How can I use df_max along with df , to create a time-series of values corresponding to the maximum value in each row of df ? This is the output I want:

0    27
1    25
2    34
3    11
4    27
5    37
6    35
7    32
8    20
9    38

I know I can achieve this using df.max(1) , but I want to know how to arrive at this same output by using df_max , since I want to be able to apply df_max to other matrices (not df ) which share the same columns and indices as df (but not the same values).

You may try df.lookup

df.lookup(df_max.index, df_max)

Out[628]: array([27, 25, 34, 11, 27], dtype=int64)

If you want Series/DataFrame, you pass the output to the Series/DataFrame constructor

pd.Series(df.lookup(df_max.index, df_max), index=df_max.index)

Out[630]:
0    27
1    25
2    34
3    11
4    27
dtype: int64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM