简体   繁体   中英

Pandas: Aggregate data frame based on column values

I have a data set that looks like this:

        Name Volume Period
1        oil  29000 Jun 21
2       gold    800 Mar 22
3        oil  21000 Jul 21
4       gold   1100 Sep 21
5       gold   3000 Feb 21
6    depower      3  Q1 21
7        oil  23000 Apr 22
8    czpower     26  Q1 23
9        oil  17000  Q1 21
10      gold   2400 May 21
11       oil  12000  Q2 21
12      gold   1800 Jan 22
13   czpower     21 Oct 21
14  api2coal   6000  Q1 22
15  api2coal  11000  Q1 21
16   depower     11 Jan 22
17  api2coal  16000 Jul 21
18      gold   1300 Mar 21
19   depower      3  Q1 22
20       oil  17000 Cal 21

I would like to reshape the data set so as to obtain a data frame with the following characteristics:

  • The values in Name will become the new variables (columns);
  • The values in Period will become the indexes (should be unique);
  • The values in Volume are the sums of the values of each combination of Name and Period .

The resulting data frame should look similar to this:

       Period api2coal czpower depower  gold     oil
    1: Apr 21   241000     273     213 42500  299000
    2: Apr 22   236000     360     234 55700  404000
    3: Aug 21   210000     270     305 70000  283000
    4: Aug 22   191000     307     236 63200  307000
    5: Cal 21   605000     648     852     0  765000
    6: Cal 22   710000     718     714     0  542000
    7: Cal 23   813000     875     872     0  844000
    8: Cal 24   642000     723     734     0  763000
    9: Cal 25   109000     164     274     0  195000
   10: Dec 21   229000     322     241 47000  254000
   11: Dec 22   135000     232     243 52900  240000
   12: Feb 21   474000     238     140 42500  292000
   13: Feb 22   509000     272     141 45700  270000
   14: Jan 21   210000     301     332 70300  122000
   15: Jan 22   206000     236     229 54200  180000
   16: Jan 23   350000     246     359 56700  208000
   17: Jul 21   281000     430     371 43500  287000
   18: Jul 22   355000     241     255 71200  160000
   19: Jun 21   341000     182     156 39900  278000
   20: Jun 22   180000     165     175 52100  348000

Can someone please give me a hint as to how to achieve this? Thank you in advance.

这是你要找的吗?

newDF = df.pivot_table("Volume", ["Period"], "Name")

在尝试了几件事后我找到了解决方案。

pd.pivot_table(df, values="Volume", index=["Period"], columns=["Name"], aggfunc=np.sum)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM