简体   繁体   English

Pandas:基于列值聚合数据框

[英]Pandas: Aggregate data frame based on column values

I have a data set that looks like this:我有一个如下所示的数据集:

        Name Volume Period
1        oil  29000 Jun 21
2       gold    800 Mar 22
3        oil  21000 Jul 21
4       gold   1100 Sep 21
5       gold   3000 Feb 21
6    depower      3  Q1 21
7        oil  23000 Apr 22
8    czpower     26  Q1 23
9        oil  17000  Q1 21
10      gold   2400 May 21
11       oil  12000  Q2 21
12      gold   1800 Jan 22
13   czpower     21 Oct 21
14  api2coal   6000  Q1 22
15  api2coal  11000  Q1 21
16   depower     11 Jan 22
17  api2coal  16000 Jul 21
18      gold   1300 Mar 21
19   depower      3  Q1 22
20       oil  17000 Cal 21

I would like to reshape the data set so as to obtain a data frame with the following characteristics:我想重塑数据集,以获得具有以下特征的数据框:

  • The values in Name will become the new variables (columns); Name中的值将成为新的变量(列);
  • The values in Period will become the indexes (should be unique); Period中的值将成为索引(应该是唯一的);
  • The values in Volume are the sums of the values of each combination of Name and Period . Volume中的值是NamePeriod的每个组合的值的总和。

The resulting data frame should look similar to this:生成的数据框应如下所示:

       Period api2coal czpower depower  gold     oil
    1: Apr 21   241000     273     213 42500  299000
    2: Apr 22   236000     360     234 55700  404000
    3: Aug 21   210000     270     305 70000  283000
    4: Aug 22   191000     307     236 63200  307000
    5: Cal 21   605000     648     852     0  765000
    6: Cal 22   710000     718     714     0  542000
    7: Cal 23   813000     875     872     0  844000
    8: Cal 24   642000     723     734     0  763000
    9: Cal 25   109000     164     274     0  195000
   10: Dec 21   229000     322     241 47000  254000
   11: Dec 22   135000     232     243 52900  240000
   12: Feb 21   474000     238     140 42500  292000
   13: Feb 22   509000     272     141 45700  270000
   14: Jan 21   210000     301     332 70300  122000
   15: Jan 22   206000     236     229 54200  180000
   16: Jan 23   350000     246     359 56700  208000
   17: Jul 21   281000     430     371 43500  287000
   18: Jul 22   355000     241     255 71200  160000
   19: Jun 21   341000     182     156 39900  278000
   20: Jun 22   180000     165     175 52100  348000

Can someone please give me a hint as to how to achieve this?有人可以给我一个关于如何实现这一目标的提示吗? Thank you in advance.先感谢您。

这是你要找的吗?

newDF = df.pivot_table("Volume", ["Period"], "Name")

在尝试了几件事后我找到了解决方案。

pd.pivot_table(df, values="Volume", index=["Period"], columns=["Name"], aggfunc=np.sum)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM