![](/img/trans.png)
[英]How to group by key and only return observation by max from pandas dataframe
[英]How do you return only a group by in pandas?
我有以下腳本,我想要一個簡單的分組依據:
# import the pandas module
import pandas as pd
from openpyxl import load_workbook
writer = pd.ExcelWriter(r'D:\temp\test.xlsx', engine='openpyxl')
# Create an example dataframe
raw_data = {'Date': ['2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13','2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13'],
'Portfolio': ['A', 'A', 'A', 'A', 'A', 'A', 'B', 'B','B', 'B', 'B', 'C', 'C', 'C', 'C', 'C', 'C'],
'Duration': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3],
'Yield': [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1],}
df = pd.DataFrame(raw_data, columns = ['Date', 'Portfolio', 'Duration', 'Yield'])
dft = df.groupby(['Date', 'Portfolio', 'Duration', 'Yield'], as_index =False)
這將按對象創建一個熊貓組。
然后,我想將其輸出到excel:
dft.to_excel(writer, 'test', index=False)
writer.save()
但是它返回一個錯誤:
AttributeError: Cannot access callable attribute 'to_excel' of 'DataFrameGroupBy' objects, try using the 'apply' method
我為什么需要申請? 我只希望按結果分組以刪除重復項。
實際上,您可以使用groupby
刪除重復項,方法是取每個組的第一個或均值,例如:
df.groupby(['Date', 'Portfolio', 'Duration', 'Yield'], as_index=False).mean()
df.groupby(['Date', 'Portfolio', 'Duration', 'Yield'], as_index=False).first()
請注意,您必須應用一個函數(在這種情況下,使用mean
或first
方法)才能從groupby對象獲取DataFrame。 然后可以將其寫入excel。
但是正如@EdChum所指出的,在這種情況下,使用數據幀的drop_duplicates
方法是更簡單的方法:
df.drop_duplicates(subset=['Date', 'Portfolio', 'Duration', 'Yield'])
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.