How do I get the maximum within a subset of my dataframe in Pandas?
For example, when I do something like
statedata[statedata['state.region'] == 'Northeast'].ix[statedata['Murder'].idxmax()]
I get a KeyError that indicates that idxmax
is returning the key for the global maximum, Alabama, rather than the maximum within the queried subset (from which that key is of course missing).
Is there a way to do this concisely on Pandas?
For reference, the data used here is from R, using
data(state)
statedata = cbind(data.frame(state.x77), state.abb, state.area, state.center, state.division, state.name, state.region)
then exported from R and imported by Pandas.
You could use df.loc to select the sub-DataFrame:
import pandas as pd
import pandas.rpy.common as com
import rpy2.robjects as ro
r = ro.r
statedata = r('''cbind(data.frame(state.x77), state.abb, state.area, state.center,
state.division, state.name, state.region)''')
df = com.convert_robj(statedata)
df.columns = df.columns.to_series().str.replace('state.', '')
subdf = df.loc[df['region']=='Northeast', 'Murder']
print(subdf)
# Connecticut 3.1
# Maine 2.7
# Massachusetts 3.3
# New Hampshire 3.3
# New Jersey 5.2
# New York 10.9
# Pennsylvania 6.1
# Rhode Island 2.4
# Vermont 5.5
# Name: Murder, dtype: float64
print(subdf.idxmax())
prints
New York
To select the state with the highest murder rate ( as of 1976 ) for each region:
In [24]: df.groupby('region')['Murder'].idxmax()
Out[24]:
region
North Central Michigan
Northeast New York
South Alabama
West Nevada
Name: Murder, dtype: object
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.