简体   繁体   English

熊猫:将摘要信息添加到groupby框架中的新列中

[英]Pandas: Adding summary information to new columns in a groupby frame

Working on a class assignment. 进行课堂作业。

Our current dataset has information that looks like: 我们当前的数据集具有如下信息:

    Item ID      Item Name                                  Price
0   108          Extraction, Quickblade Of Trembling Hands  3.53
1   143          Frenzied Scimitar                          1.56
2   92           Final Critic                               4.88
3   100          Blindscythe                                3.27
4   131          Fury                                       1.44

We were asked to group by two values, which I've done. 我们被要求按两个值分组,这已经完成。

item_df = popcolumns_df.groupby(["Item ID","Item Name"])  

I'm having issues though, trying to append the groupby functions to this dataframe. 我遇到了问题,试图将groupby函数附加到此数据帧。 For instance, when I run count, the count replaces the price. 例如,当我运行count时,count取代了价格。 Attempt one just replaced all the data in the price column with the counts. 尝试将价格列中的所有数据替换为计数。

item_counts = item_df.count().reset_index() 

Output: 输出:

    Item ID     Item Name           Price
0   0           Splinter             4
1   1           Crucifer             3
2   2           Verdict              6
3   3           Phantomlight         6
4   4           Bloodlord's Fetish   5

Attempt 2 did the same: 尝试2进行了相同的操作:

item_counts = item_df.size().reset_index(name="Counts")

My desired output is: 我想要的输出是:

     Item ID    Item Name                Price    Count   Revenue
0    108        Extraction, Quickblade   3.53     12      42.36
1    143        Frenzied Scimitar        1.56     3        4.68
2    92         Final Critic             4.88     2        9.76
3    100        Blindscythe              3.27     1        3.27
4    131        Fury                     1.44     5        7.20

I would likely just use a sum on the groups to get the revenue. 我可能只对各组使用总和来获得收入。 I've been stumped on this for a couple of hours, so any help would be greatly appreciated! 我已经为此困扰了几个小时,所以任何帮助将不胜感激!

If the prices for any two equivalent items is the same, then you could include "Price" in your grouping, and then compute the group sizes : 如果任何两个等效项目的价格相同,则可以在分组中包含"Price" ,然后计算分组大小

summary = popcolumns_df \
    .groupby(["Item ID", "Item Name", "Price"]) \
    .size() \
    .rename("Count") \
    .reset_index()

summary['Revenue'] = summary['Count'] * summary['Price']

The call to pd.Series.rename makes the column in the final dataframe be named "Count" . pd.Series.rename的调用使最终数据pd.Series.rename的列命名为"Count"

I think you're looking for the transform method of the groupby. 我认为您正在寻找groupby的transform方法。 That returns aggregate metrics at the original level of your data. 这将返回原始数据级别的汇总指标。

For example, to create a new column in your original data for the count of some grouping: 例如,要在原始数据中创建一个新列以用于某些分组的计数:

df['group_level_count'] = df.groupby(['foo', 'bar']).transform('count')  # or 'size' I think, depending whether you want to count NaNs

Related: * How to count number of rows per group (and other statistics) in pandas group by? 相关:* 如何计算熊猫分组依据中每组的行数(以及其他统计信息)? * https://pandas.pydata.org/pandas-docs/stable/groupby.html#transformation * https://pandas.pydata.org/pandas-docs/stable/groupby.html#transformation

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM