简体   繁体   中英

pandas pivot table issue - assuming it is how i am structuring it?

i have a dataset that contains video game platforms, and the year that games were released for it.

what i'm trying to do is end up with a dataframe that has the count of titles for each year released by platform.

my initial dataframe looks like this:

platform    year
0   Wii     2006.0
1   NES     1985.0
2   Wii     2008.0
3   Wii     2009.0
4   GB      1996.0
5   GB      1989.0
6   DS      2006.0
7   Wii     2006.0
8   Wii     2009.0
9   NES     1984.0
10  DS      2005.0
11  DS      2005.0
12  GB      1999.0
13  Wii     2007.0
14  X360    2010.0
15  Wii     2009.0
16  PS3     2013.0
17  PS2     2004.0
18  SNES    1990.0
19  DS      2005.0

i'm using a groupby to get them together:

df = df.sort_values(['year']).groupby(['year'])['platform'].value_counts()

which gets me close:

year           platform
1980.0           2600         9
1981.0           2600        46
1982.0           2600        36
1983.0           2600        11
                 NES          6
1984.0           NES         13
                 2600         1
1985.0           NES         11
                 2600         1
                 DS           1

but this is a series, and with the year being the index i can't stick this into something like a heatmap.

here is an example of the desired output:

   year platform  #_titles
1980    2600        9
1981    2600        46
1982    2600        36
1983    2600        11
1983    NES         6
1984    NES         13
1984    2600        1
1985    NES         11
1985    2600        1
1985    DS          1
1985    PC          1
1986    NES         19
1986    2600        2
1987    NES         10
1987    2600        6
1988    NES         11
1988    2600        2
1988    GB          1
1988    PC          1
1989    GB          10

I was thinking i might need to use a pivot_table() but this is something i am still quite new to and am struggling to implement.

i tried something like:

df = df.pivot_table(df,index='year',columns = 'platform',aggfunc = 'count') 

but my output then is just the year.

clearly i am doing something wrong, and figure it is time to stop beating my virtual head on juypter notebook and ask for some advice.

I am fine with getting the original group method to work, or using a pivot table either way - I just would appreciate some pointers on what i'm doing wrong so i can correct it.

Thanks for your time in advance,

Jared

edit: here is the result from the first answer (which would be perfect, if it had the aggfunc in it? not sure why that isn't there?): |year|platform| |----|--------| |1980.0|2600| |1981.0|2600| |1982.0|2600| |1983.0|2600 ||NES| |1984.0|2600| ||NES|

Here is the solution with pivot table:

res = pd.pivot_table(df,index=['year', 'platform'],aggfunc = 'size')

>>> print(res)

year    platform
1984.0  NES         1
1985.0  NES         1
1989.0  GB          1
1990.0  SNES        1
1996.0  GB          1
1999.0  GB          1
2004.0  PS2         1
2005.0  DS          3
2006.0  DS          1
        Wii         2
2007.0  Wii         1
2008.0  Wii         1
2009.0  Wii         3
2010.0  X360        1
2013.0  PS3         1

Maybe this is what you want? Hard to tell since your output doesn't match the input.

df.sort_values(['year']).groupby(['year','platform']).size().reset_index(name='#_titles')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM