Needed some help with pandas...I'm working on this data and I'm trying to calculate some changes over time per region. Basically, I'm trying to find the oldest quantity and the newest quantity for each area in question. I have code that can give me the year of the most recent and oldest data recordes, however I need to gather the whole row so I can work on the 'quantity' column. Any inputs? here is what i have :
df.groupby(['Country or Area'])['Year'].max()
Thanks in advance!
df = df.sort_values(by=['Country or Area','Year'])
df.groupby('Country or Area').agg(['first','last']).stack()
Use idxmin() and idxmax(). Something like:
grp = df.groupby(['Country or Area'])
for name,group in grp:
print(name)
minidx = group['Year'].idxmin()
maxidx = group['Year'].idxmax()
print(f"min: {group['Year'][minidx]} {group['Quantity'][minidx]}")
print(f"max: {group['Year'][maxidx]} {group['Quantity'][maxidx]}")
print()
您可以使用idxmin
和idxmax
获取最旧和最新idxmax
df.loc[df.groupby(['Country or Area'])['Year'].idxmin()]
You need to use agg functions of groupby()
You can pass the functions or a dict of functions to the columns you need to aggregate
In your case the code should be like Crish solution is the better way to do it.
Sort the dataframe by the value to check and then group and get by .agg() the result that you need
The stack() method works to deflate the df level
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.