汇总 pandas 中列的观察值

Question

Suppose I have a big Dataframe DS_df w/ column names year, dealamount and CCS among others.假设我有一个大的 Dataframe DS_df 列名年份、交易金额和 CCS 等。 For every year, from 1985 until 2020, I need a separate panda series ie sum_2019.从 1985 年到 2020 年，每一年，我都需要一个单独的熊猫系列，即 sum_2019。 I need to sum the dealamount, if CCS does occur multiple times (if it occurs only once, it should just be added to the series) and the year matches:我需要总结交易金额，如果 CCS 确实发生多次（如果它只发生一次，它应该被添加到系列中）并且年份匹配：

    year    dealamount  CCS
0   2013    37,522,700  Albania_Azerbaijan
1   2013    37,522,700  Albania_Azerbaijan
2   2016    436,341,300 Albania_Greece
3   2019    763,189,200 Albania_Russia
4   2019    763,189,200 Albania_Russia
5   2019    763,189,200 Albania_Russia
6   2019    763,189,200 Albania_Russia
7   2017    150,931,000 Albania_Turkey
8   2016    275,293,750 Albania_Turkey
9   2009    258,328,000 Albania_Turkey
10  2019    153,452,000 Albania_Venezuela
11  2019    153,452,000 Albania_Venezuela
11  2017    153,452,000 Albania_Venezuela

So in this case, sum_2019 should be a panda series w/ the Index being CCS and the summed dealamount as "observation".所以在这种情况下，sum_2019 应该是一个熊猫系列，其中索引为 CCS，总交易量为“观察”。

Albania_Russia 3,052,756,800
Albania_Venezuela 306,904

Likewise, for sum_2013:同样，对于 sum_2013：

Albania_Azerbaijan 75,045,400

Any help is greatly appreciated, as I need to this for quite a lot of data points and feel quite lost (really new to python) How would I go about properly automating this?非常感谢任何帮助，因为我需要很多数据点并且感觉很迷茫（对python来说真的很新）我将如何正确地自动化这个？

Thank you!!谢谢！！

Answer 1

Do you want this?你想要这个吗？

df.dealamount = df.dealamount.str.replace(',','').astype(int)
new_df = df.groupby(['year','CCS']).agg({'dealamount': sum})

Output - Output -

                         dealamount
year CCS                           
2009 Albania_Turkey       258328000
2013 Albania_Azerbaijan    75045400
2016 Albania_Greece       436341300
     Albania_Turkey       275293750
2017 Albania_Turkey       150931000
     Albania_Venezuela    153452000
2019 Albania_Russia      3052756800
     Albania_Venezuela    306904000

Answer 2

# to avoid scientific notation (e notation)
pd.set_option('display.float_format', lambda x: '%.d' % x) 

# first filter by 'year' then group by 'CSS' and finally sum by 'dealamount'
sum_2019 = df[df['year']==2019].groupby('CCS')['dealamount'].sum()

print(sum_2019)
CCS
Albania_Russia      3052756800
Albania_Venezuela    306904000
Name: dealamount, dtype: float64

汇总 pandas 中列的观察值

问题描述

2 个解决方案

解决方案1
1 2021-05-01 07:06:38

解决方案2
0 已采纳 2021-05-01 07:10:58

汇总 pandas 中列的观察值

问题描述

2 个解决方案

解决方案1 1 2021-05-01 07:06:38

解决方案2 0 已采纳 2021-05-01 07:10:58

解决方案1
1 2021-05-01 07:06:38

解决方案2
0 已采纳 2021-05-01 07:10:58