简体   繁体   English

错误'AttributeError:'DataFrameGroupBy'对象没有属性'而数据框上的groupby功能

[英]Error 'AttributeError: 'DataFrameGroupBy' object has no attribute' while groupby functionality on dataframe

I have a dataframe news_count .我有一个数据news_count Here are its column names, from the output of news_count.columns.values :下面是它的列名,来自news_count.columns.values的输出:

 [('date', '') ('EBIX UW Equity', 'NEWS_SENTIMENT_DAILY_AVG') ('Date', '')
  ('day', '') ('month', '') ('year', '')]

I need to groupby by year and month and sum values of 'NEWS_SENTIMENT_DAILY_AVG' .我需要按年和月分组,并对 ' groupby 'NEWS_SENTIMENT_DAILY_AVG'值求和。 Below is code I tried, but neither work:以下是我尝试过的代码,但都不起作用:

Attempt 1尝试 1

news_count.groupby(['year','month']).NEWS_SENTIMENT_DAILY_AVG.values.sum()

'AttributeError: 'DataFrameGroupBy' object has no attribute' 

Attempt 2尝试 2

news_count.groupby(['year','month']).iloc[:,1].values.sum()

AttributeError: Cannot access callable attribute 'iloc' of 'DataFrameGroupBy' objects, try using the 'apply' method

Input data:输入数据:

      ticker       date           EBIX UW Equity    month    year
      field             NEWS_SENTIMENT_DAILY_AVG
         0      2007-05-25                   0.3992      5       2007
         1      2007-11-06                   0.3936      11      2007 
         2      2007-11-07                   0.2039      11      2007
         3      2009-01-14                   0.2881       1      2014

extract required columns from dataframe in news_count_res variable and then apply aggregation functionnews_count_res变量中的数据框中提取所需的列,然后应用聚合函数

news_count_res = news_count[['year','month','NEWS_SENTIMENT_DAILY_AVG']]
news_count_res.group(['year','month']).sum()

In following way we can apply aggregation functions on required columns without extracting columns as well 通过以下方式,我们可以将聚合函数应用于所需的列,而无需提取列

news_count.group(['year','month'])['NEWS_SENTIMENT_DAILY_AVG'].sum() news_count.group(['year','month'])['NEWS_SENTIMENT_DAILY_AVG']。sum()

Thanks to answers so far (I've made comments there as I haven't got those solutions to work--maybe I'm not understanding something).感谢到目前为止的答案(我已经在那里发表了评论,因为我没有这些解决方案可以工作——也许我不理解某些东西)。 In the meantime, I've also come up with another approach, which I still suspect isn't very Pythonic.与此同时,我还提出了另一种方法,我仍然怀疑它不是 Pythonic。 It does get the job done and doesn't take too long for my purposes, but it would be great if I could figure out how to tweak the approaches suggested above to get them to work...any thoughts very welcome!它确实可以完成工作,并且不会花费太长时间来达到我的目的,但是如果我能弄清楚如何调整上面建议的方法以使它们起作用,那就太好了……任何想法都非常受欢迎!

Here's what I've got:这是我所拥有的:

    import pandas as pd
    import math
    y = ['Alex'] * 2321 + ['Doug'] * 34123  + ['Chuck'] * 2012 + ['Bob'] * 9281 
        z = ['xyz'] * len(y)
    df = pd.DataFrame({'persons': y, 'data' : z})
    percent = 10  #CHANGE AS NEEDED

    #add a 'helper'column with random numbers
    df['rand'] = np.random.random(df.shape[0])
    df = df.sample(frac=1)  #optional:  this shuffles data, just to show order doesn't matter

    #CREATE A HELPER LIST
    helper = pd.DataFrame(df.groupby('persons')['rand'].count()).reset_index().values.tolist()
    for row in helper:
        df_temp = df[df['persons'] == row[0]][['persons','rand']]
        lim = math.ceil(len(df_temp) * percent * 0.01)
        row.append(df_temp.nlargest(lim,'rand').iloc[-1][1])

    def flag(name,num):
        for row in helper:
            if row[0] == name:
                if num >= row[2]:
                    return 'yes'
                else:
                    return 'no'
    
    df['flag'] = df.apply(lambda x: flag(x['persons'], x['rand']), axis=1)

And to check the results:并检查结果:

piv = df.pivot_table(index="persons", columns="flag", values="data", aggfunc='count', fill_value=0)
piv = piv.apivend(piv.sum().rename('Total')).assign(Total=lambda x: x.sum(1))
piv['% selected'] = 100 * piv.yes/piv.Total
print(piv)

OUTPUT:
flag        no   yes  Total  % selected
persons                                
Alex      2088   233   2321   10.038776
Bob       8352   929   9281   10.009697
Chuck     1810   202   2012   10.039761
Doug     30710  3413  34123   10.002051
Total    42960  4777  47737   10.006913

Seems to work with different %s and different numbers of persons...but it would be nice to make it simpler, I think.似乎与不同的 %s 和不同数量的人一起工作......但我认为让它更简单会很好。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 AttributeError: 'DataFrameGroupBy' object 没有属性 'colnames' - AttributeError: 'DataFrameGroupBy' object has no attribute 'colnames' AttributeError: 'DataFrame' object 没有属性 - AttributeError: 'DataFrame' object has no attribute AttributeError: 'DataFrame' object 没有属性 'items' (熊猫 DataFrame 错误) - AttributeError: 'DataFrame' object has no attribute 'items' (pandas DataFrame error) dask.dataframe.groupby.DataFrameGroupBy 错误 - dask.dataframe.groupby.DataFrameGroupBy ERROR AttributeError:“dict”对象没有属性“groupby” - AttributeError: 'dict' object has no attribute 'groupby' pyspark - AttributeError: 'NoneType' object 没有属性 'groupby' - pyspark - AttributeError: 'NoneType' object has no attribute 'groupby' AttributeError: 'NoneType' object 在将 DataFrame 保存到 xls 时没有属性 'save' - AttributeError: 'NoneType' object has no attribute 'save' while saving DataFrame to xls AttributeError:'str'对象没有属性'groupby' - AttributeError: 'str' object has no attribute 'groupby' AttributeError: 无法访问“DataFrameGroupBy”对象的可调用属性“groupby” - AttributeError: Cannot access callable attribute 'groupby' of 'DataFrameGroupBy' objects “AttributeError: 'DataFrameGroupBy' object has no attribute 'get'” 当试图在 Seaborn 的.boxplot() 中对 plot 分组数据进行装箱时 - “AttributeError: 'DataFrameGroupBy' object has no attribute 'get'” when attempting to box plot grouped data in Seaborn's .boxplot()
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM