简体   繁体   中英

How do I get the maximum within a subset of my dataframe in Pandas?

How do I get the maximum within a subset of my dataframe in Pandas?

For example, when I do something like

statedata[statedata['state.region'] == 'Northeast'].ix[statedata['Murder'].idxmax()]

I get a KeyError that indicates that idxmax is returning the key for the global maximum, Alabama, rather than the maximum within the queried subset (from which that key is of course missing).

Is there a way to do this concisely on Pandas?


For reference, the data used here is from R, using

data(state)
statedata = cbind(data.frame(state.x77), state.abb, state.area, state.center, state.division, state.name, state.region)

then exported from R and imported by Pandas.

You could use df.loc to select the sub-DataFrame:

import pandas as pd
import pandas.rpy.common as com
import rpy2.robjects as ro

r = ro.r
statedata = r('''cbind(data.frame(state.x77), state.abb, state.area, state.center,
                 state.division, state.name, state.region)''')
df = com.convert_robj(statedata)
df.columns = df.columns.to_series().str.replace('state.', '')
subdf = df.loc[df['region']=='Northeast', 'Murder']
print(subdf)
# Connecticut       3.1
# Maine             2.7
# Massachusetts     3.3
# New Hampshire     3.3
# New Jersey        5.2
# New York         10.9
# Pennsylvania      6.1
# Rhode Island      2.4
# Vermont           5.5
# Name: Murder, dtype: float64
print(subdf.idxmax())

prints

New York

To select the state with the highest murder rate ( as of 1976 ) for each region:

In [24]: df.groupby('region')['Murder'].idxmax()
Out[24]: 
region
North Central    Michigan
Northeast        New York
South             Alabama
West               Nevada
Name: Murder, dtype: object

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM