简体   繁体   中英

Return rows in pandas based on values in multiple columns

Needed some help with pandas...I'm working on this data and I'm trying to calculate some changes over time per region. Basically, I'm trying to find the oldest quantity and the newest quantity for each area in question. I have code that can give me the year of the most recent and oldest data recordes, however I need to gather the whole row so I can work on the 'quantity' column. Any inputs? here is what i have :

df.groupby(['Country or Area'])['Year'].max()

Thanks in advance!

df = df.sort_values(by=['Country or Area','Year'])
df.groupby('Country or Area').agg(['first','last']).stack()

Use idxmin() and idxmax(). Something like:

grp = df.groupby(['Country or Area'])

for name,group in grp:
    print(name)

    minidx = group['Year'].idxmin()
    maxidx = group['Year'].idxmax()

    print(f"min: {group['Year'][minidx]} {group['Quantity'][minidx]}")
    print(f"max: {group['Year'][maxidx]} {group['Quantity'][maxidx]}")
    print()

您可以使用idxminidxmax获取最旧和最新idxmax

df.loc[df.groupby(['Country or Area'])['Year'].idxmin()]

You need to use agg functions of groupby()

You can pass the functions or a dict of functions to the columns you need to aggregate

In your case the code should be like Crish solution is the better way to do it.

Sort the dataframe by the value to check and then group and get by .agg() the result that you need

The stack() method works to deflate the df level

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM