[英]Error 'AttributeError: 'DataFrameGroupBy' object has no attribute' while groupby functionality on dataframe
I have a dataframe news_count
.我有一个数据
news_count
。 Here are its column names, from the output of news_count.columns.values
:下面是它的列名,来自
news_count.columns.values
的输出:
[('date', '') ('EBIX UW Equity', 'NEWS_SENTIMENT_DAILY_AVG') ('Date', '')
('day', '') ('month', '') ('year', '')]
I need to groupby
by year and month and sum values of 'NEWS_SENTIMENT_DAILY_AVG'
.我需要按年和月分组,并对 '
groupby
'NEWS_SENTIMENT_DAILY_AVG'
值求和。 Below is code I tried, but neither work:以下是我尝试过的代码,但都不起作用:
news_count.groupby(['year','month']).NEWS_SENTIMENT_DAILY_AVG.values.sum()
'AttributeError: 'DataFrameGroupBy' object has no attribute'
news_count.groupby(['year','month']).iloc[:,1].values.sum()
AttributeError: Cannot access callable attribute 'iloc' of 'DataFrameGroupBy' objects, try using the 'apply' method
Input data:输入数据:
ticker date EBIX UW Equity month year
field NEWS_SENTIMENT_DAILY_AVG
0 2007-05-25 0.3992 5 2007
1 2007-11-06 0.3936 11 2007
2 2007-11-07 0.2039 11 2007
3 2009-01-14 0.2881 1 2014
extract required columns from dataframe in news_count_res
variable and then apply aggregation function从
news_count_res
变量中的数据框中提取所需的列,然后应用聚合函数
news_count_res = news_count[['year','month','NEWS_SENTIMENT_DAILY_AVG']]
news_count_res.group(['year','month']).sum()
news_count.group(['year','month'])['NEWS_SENTIMENT_DAILY_AVG'].sum() news_count.group(['year','month'])['NEWS_SENTIMENT_DAILY_AVG']。sum()
Thanks to answers so far (I've made comments there as I haven't got those solutions to work--maybe I'm not understanding something).感谢到目前为止的答案(我已经在那里发表了评论,因为我没有这些解决方案可以工作——也许我不理解某些东西)。 In the meantime, I've also come up with another approach, which I still suspect isn't very Pythonic.
与此同时,我还提出了另一种方法,我仍然怀疑它不是 Pythonic。 It does get the job done and doesn't take too long for my purposes, but it would be great if I could figure out how to tweak the approaches suggested above to get them to work...any thoughts very welcome!
它确实可以完成工作,并且不会花费太长时间来达到我的目的,但是如果我能弄清楚如何调整上面建议的方法以使它们起作用,那就太好了……任何想法都非常受欢迎!
Here's what I've got:这是我所拥有的:
import pandas as pd
import math
y = ['Alex'] * 2321 + ['Doug'] * 34123 + ['Chuck'] * 2012 + ['Bob'] * 9281
z = ['xyz'] * len(y)
df = pd.DataFrame({'persons': y, 'data' : z})
percent = 10 #CHANGE AS NEEDED
#add a 'helper'column with random numbers
df['rand'] = np.random.random(df.shape[0])
df = df.sample(frac=1) #optional: this shuffles data, just to show order doesn't matter
#CREATE A HELPER LIST
helper = pd.DataFrame(df.groupby('persons')['rand'].count()).reset_index().values.tolist()
for row in helper:
df_temp = df[df['persons'] == row[0]][['persons','rand']]
lim = math.ceil(len(df_temp) * percent * 0.01)
row.append(df_temp.nlargest(lim,'rand').iloc[-1][1])
def flag(name,num):
for row in helper:
if row[0] == name:
if num >= row[2]:
return 'yes'
else:
return 'no'
df['flag'] = df.apply(lambda x: flag(x['persons'], x['rand']), axis=1)
And to check the results:并检查结果:
piv = df.pivot_table(index="persons", columns="flag", values="data", aggfunc='count', fill_value=0)
piv = piv.apivend(piv.sum().rename('Total')).assign(Total=lambda x: x.sum(1))
piv['% selected'] = 100 * piv.yes/piv.Total
print(piv)
OUTPUT:
flag no yes Total % selected
persons
Alex 2088 233 2321 10.038776
Bob 8352 929 9281 10.009697
Chuck 1810 202 2012 10.039761
Doug 30710 3413 34123 10.002051
Total 42960 4777 47737 10.006913
Seems to work with different %s and different numbers of persons...but it would be nice to make it simpler, I think.似乎与不同的 %s 和不同数量的人一起工作......但我认为让它更简单会很好。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.