I have a pandas table:
Data Years Y
A 2001 3
A 2007 5
A 2002 8
A 2009 1
B 2001 8
В 2004 5
С 2004 4
С 2006 6
С 2005 9
How can I analyze all the data for A, B and C separately? For example, histogram of each Data
per Year
in one plot? Should it be something with pivot table or not?
You can try pivot
:
print df
Data Years Y
0 A 2001 3
1 A 2007 5
2 A 2002 8
3 A 2009 1
4 B 2001 8
5 B 2004 5
6 C 2004 4
7 C 2006 6
8 C 2005 9
df1 = df.pivot(index='Data', columns='Years', values='Y')
print df1
Years 2001 2002 2004 2005 2006 2007 2009
Data
A 3.0 8.0 NaN NaN NaN 5.0 1.0
B 8.0 NaN 5.0 NaN NaN NaN NaN
C NaN NaN 4.0 9.0 6.0 NaN NaN
If you need count not NaN
values, use notnull
and then convert boolean DataFrame
to int
by astype
:
print df1.notnull().astype(int)
Years 2001 2002 2004 2005 2006 2007 2009
Data
A 1 1 0 0 0 1 1
B 1 0 1 0 0 0 0
C 0 0 1 1 1 0 0
If you have duplicates data in column Years
, you can use pivot_table
with aggfunc
, eg sum
. I have duplicates in row 2
and 3
:
print df
Data Years Y
0 A 2001 3
1 A 2007 5
2 A 2002 8
3 A 2002 10
4 A 2009 1
5 B 2001 8
6 B 2004 5
7 C 2004 4
8 C 2006 6
9 C 2005 9
print df.pivot_table(index='Data', columns='Years', values='Y', aggfunc=sum)
Years 2001 2002 2004 2005 2006 2007 2009
Data
A 3.0 18.0 NaN NaN NaN 5.0 1.0
B 8.0 NaN 5.0 NaN NaN NaN NaN
C NaN NaN 4.0 9.0 6.0 NaN NaN
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.