I am trying to generate a heatmap using seaborn, however I am having a small problem with the formatting of my data.
Currently, my data is in the form:
Name Diag Date
A 1 2006-12-01
A 1 1994-02-12
A 2 2001-07-23
B 2 1999-09-12
B 1 2016-10-12
C 3 2010-01-20
C 2 1998-08-20
I would like to create a heatmap (preferably in python) showing Name
on one axis against Diag
- if occured. I have tried to pivot the table using pd.pivot
, however I was given the error
ValueError: Index contains duplicate entries, cannot reshape
this came from:
piv = df.pivot_table(index='Name',columns='Diag')
Time is irrelevant, but I would like to show which Names
have had which Diag
, and which Diag
combos cluster together. Do I need to create a new table for this or is it possible for that I have? In some cases the Name
is not associated with all Diag
EDIT: I have since tried: piv = df.pivot_table(index='Name',columns='Diag', values='Time', aggfunc='mean')
However as Time is in datetime format, I end up with:
pandas.core.base.DataError: No numeric types to aggregate
You need pivot_table
with some aggregate function, because for same index and column have multiple values and pivot
need unique values only:
print (df)
Name Diag Time
0 A 1 12 <-duplicates for same A, 1 different value
1 A 1 13 <-duplicates for same A, 1 different value
2 A 2 14
3 B 2 18
4 B 1 1
5 C 3 9
6 C 2 8
df = df.pivot_table(index='Name',columns='Diag', values='Time', aggfunc='mean')
print (df)
Diag 1 2 3
Name
A 12.5 14.0 NaN
B 1.0 18.0 NaN
C NaN 8.0 9.0
Alternative solution:
df = df.groupby(['Name','Diag'])['Time'].mean().unstack()
print (df)
Diag 1 2 3
Name
A 12.5 14.0 NaN
B 1.0 18.0 NaN
C NaN 8.0 9.0
EDIT:
You can also check all duplicates by duplicated
:
df = df.loc[df.duplicated(['Name','Diag'], keep=False), ['Name','Diag']]
print (df)
Name Diag
0 A 1
1 A 1
EDIT:
mean
of datetimes is not easy - need convert dates to nanoseconds
, get mean and last convert to datetimes. Also there is another problem - need replace NaN
to some scalar, eg 0
what is converted to 0
datetime - 1970-01-01
.
df.Date = pd.to_datetime(df.Date)
df['dates_in_ns'] = pd.Series(df.Date.values.astype(np.int64), index=df.index)
df = df.pivot_table(index='Name',
columns='Diag',
values='dates_in_ns',
aggfunc='mean',
fill_value=0)
df = df.apply(pd.to_datetime)
print (df)
Diag 1 2 3
Name
A 2000-07-07 12:00:00 2001-07-23 1970-01-01
B 2016-10-12 00:00:00 1999-09-12 1970-01-01
C 1970-01-01 00:00:00 1998-08-20 2010-01-20
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.