[英]Calculate individual means for each section/slice of a DataFrame
對於當前項目,我計划計算rating_recommend
的平均值。 object 被stock_symbol
切片作為第一和quarter
作為第二切片標准。
目前,output 文件顯示了所有類別的一般平均值,如下所示:
stock_symbol quarter rating_recommend
A 2008Q2 1.270
A 2008Q3 1.270
A 2008Q4 1.270
A 2009Q1 1.270
A 2009Q2 1.270
A 2009Q3 1.270
目標是為每個類別獲得單獨的手段:
stock_symbol quarter rating_recommend
A 2008Q2 1.123
A 2008Q3 1.321
A 2008Q4 1.674
A 2009Q1 1.003
A 2009Q2 1.245
A 2009Q3 1.177
是否有任何聰明的調整來使這項工作? 相關代碼部分如下所示:
# Datetime conversion
df['date'] = pd.to_datetime(df['date'])
df['quarter'] = df['date'].dt.to_period('Q')
# Definition of the data objects
def get_top_n_bigram(row):
# Convert quantitative data and remove null values
df['rating_recommend'] = pd.to_numeric(df['rating_recommend'], errors='coerce')
return df['rating_recommend'].mean()
# Grouping data and assigning this as a new dataframe
newdf = df.groupby(['stock_symbol', 'quarter']).apply(get_top_n_bigram).to_frame(name = 'rating_recommend')
# Exporting the dataframe to Excel
newdf.to_excel('total_bigrams_pro.xlsx')
最后但並非最不重要的一點是,示例數據如下所示:
[
{"gld_index": "1-0", "stock_symbol": "AMG", "gld_id": "7172", "date": "2013-01-01", "rating_recommend": 0, "rating_outlook": 1, "rating_ceo": 1, "scr_avg": 1.0, "scr_balance": 1.0, "scr_values": 1.0, "scr_opportunities": 1.0, "scr_benefits": 1.0, "scr_management": 1.0},
{"gld_index": "1-2", "stock_symbol": "AMG", "gld_id": "7172", "date": "2011-09-15", "rating_recommend": 2, "rating_outlook": null, "rating_ceo": 2, "scr_avg": 4.0, "scr_balance": 5.0, "scr_values": null, "scr_opportunities": 4.0, "scr_benefits": 5.0, "scr_management": 4.5},
{"gld_index": "1-0", "stock_symbol": "MMM", "gld_id": "446", "date": "2017-05-14", "rating_recommend": 2, "rating_outlook": 1, "rating_ceo": 2, "scr_avg": 4.0, "scr_balance": 4.0, "scr_values": 5.0, "scr_opportunities": 3.0, "scr_benefits": 3.0, "scr_management": 4.0}
]
我想它應該工作:
newdf = df.groupby(['stock_symbol', 'quarter']).mean()
您可以一次轉換rating_recommend
而不是循環內的每一行:
# Datetime conversion
df['date'] = pd.to_datetime(df['date'])
df['quarter'] = df['date'].dt.to_period('Q')
# Convert quantitative data and remove null values
df['rating_recommend'] = pd.to_numeric(df['rating_recommend'], errors='coerce')
# Grouping data and assigning this as a new dataframe
newdf = df.groupby(['stock_symbol', 'quarter'])['rating_recommend'].mean().reset_index()
# Exporting the dataframe to Excel
newdf.to_excel('total_bigrams_pro.xlsx')
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.