简体   繁体   中英

Heatmap to visualize percentage of values

I am looking for visualizing the results below, got by grouping my data by columns, using a heatmap.

Data

    Classroom   Subject    Student
0   A   Mathematics         A.B.
1   B   Computer Science    G.M.
2   A   Computer Science    J.K.
3   B   Literature          S.R.
4   B   Computer Science    A.M.
5   A   Literature          S.R.
6   B   Mathematics         S.E.
7   C   Literature          S.T.
8   C   Mathematics         R.B.
9   A   Mathematics         B.K.

After grouping df.groupby(["Classroom", "Subject"]).size() , I have

Classroom     Subject                    
A             Mathematics                 226
              Literature                  12
              Computer Science            122
B             Mathematics                 1
              Literature                  14
              Computer Science            19
              History                     22
              Geography                   238
C             Mathematics                 5
              Literature                  15
              

Seaborn would be probably the nicest solution for creating a heatmap and showing the percentage of the values ( .sum()/len(df))*100) , if I am right) based on what I have found on the Web. This solution Python - Get percentage based on column values is certainly helpful for my question, even if it does not use seaborn for visualization. Doing this

df.groupby(["Classroom", "Subject"]).size()/len(df)*100

I get the percentage of the values. I would need also to plot these results using a heatmap. I would appreciated it if you can provide some help on this.

Seaborn's heatmap uses the columns and index of a dataframe. Pandas' pivot() and pivot_table() can create a suitable dataframe:

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

df = pd.DataFrame(
    {'Classroom': np.random.choice(['A', 'B', 'C'], 1000),
     'Subject': np.random.choice(['Mathematics', 'Literature', 'Computer Science', 'History', 'Geography'], 1000),
     'Student': [''.join(np.random.choice([*'VWXYZ'], 7)) for _ in range(1000)]})
pivoted = pd.pivot_table(df, values='Student', index='Subject', columns='Classroom', aggfunc='count') / len(df) * 100

ax = sns.heatmap(data=pivoted, annot=True, fmt='.1f')
plt.tight_layout()
plt.show()

来自 pivot_table 的 sns.heatmap

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM