[英]Using a pandas dataframe how to aggregate and groupby and bring in non aggregated/groupby columns
鑒於:
import pandas as pd
d = {'month': pd.Series(['jan', 'jan', 'feb', 'feb']),
'week' : pd.Series(['wk1', 'wk2', 'wk1', 'wk2']),
'high_temp' : pd.Series([10, 20, 30, 20]),
'low_temp' : pd.Series([4, 5, 23, 40])}
df = pd.DataFrame(d)
df
high_temp low_temp month week
0 10 4 jan wk1
1 20 5 jan wk2
2 30 23 feb wk1
3 20 40 feb wk2
我想得到一個包含這些數據的新數據框
month high_temp high_temp_week low_temp low_temp_week
0 Jan 20 wk2 4 wk1
1 Feb 30 wk1 23 wk1
我可以輕松地獲得按月分組的臨時值的最大值,但我無法弄清楚如何從具有最大值的行中提取周列。
您可以根據情況通過sort_values
來完成, drop_duplicates
並保持 last of first ,然后merge
。 您僅在月份進行合並,並指定后綴以重命名兩個數據框中的列周。
new_df = df[['month', 'high_temp', 'week']].sort_values('high_temp').drop_duplicates('month', keep='last')\
.merge(df[['month', 'low_temp', 'week']].sort_values('low_temp').drop_duplicates('month', keep='first'),
on='month', suffixes=('_high_temp', '_low_temp'))
print (new_df)
month high_temp week_high_temp low_temp week_low_temp
0 jan 20 wk2 4 wk1
1 feb 30 wk1 23 wk1
我覺得我們可以做到
s1=df.sort_values('high_temp').drop_duplicates('month',keep='last')
s2=df.sort_values('low_temp').drop_duplicates('month')
df=s1.drop('low_temp',1).merge(s2.drop('high_temp',1),on='month',suffixes=('_high','_low'))
month week_high high_temp week_low low_temp
0 jan wk2 20 wk1 4
1 feb wk1 30 wk1 23
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.