简体   繁体   中英

Is it possible to remove repeated columns/rows when plotting correlation matrix as heatmap in Seaborn?

Given the next dataframe

my_df.head()



cruce1  cruce2  cruce3  cruce4  cruce5  cruce6  cruce7  cruce8  cruce9  cruce10 ... factor75    factor80    factor85    factor90    factor95    factor100   factor105   factor110   factor115   factor120
Date                                                                                    
1993-10-28  0.0049  NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 161.75  172.45  196.86  200.33  219.21  222.67  243.23  235.77  249.48  231.56
1993-10-29  0.0002  0.0051  NaN NaN NaN NaN NaN NaN NaN NaN ... 169.13  172.64  211.90  205.58  218.63  223.16  250.21  245.71  256.47  245.63
1993-11-01  0.0041  0.0043  0.0092  NaN NaN NaN NaN NaN NaN NaN ... 165.37  170.35  215.84  198.81  216.43  222.32  246.18  247.09  253.57  254.07
1993-11-02  -0.0019 0.0022  0.0024  0.0073  NaN NaN NaN NaN NaN NaN ... 175.01  180.37  219.77  210.89  210.06  236.31  249.19  260.01  252.05  259.16
1993-11-03  0.0023  0.0004  0.0045  0.0047  0.0096  NaN NaN NaN NaN NaN ... 183.84  177.68  210.58  207.35  207.67  228.06  235.10  254.71  251.55  258.43

With this columns:

my_df.head()

Index(['cruce1', 'cruce2', 'cruce3', 'cruce4', 'cruce5', 'cruce6', 'cruce7',
       'cruce8', 'cruce9', 'cruce10', 'cruce11', 'cruce12', 'cruce13',
       'cruce14', 'cruce15', 'cruce16', 'cruce17', 'cruce18', 'cruce19',
       'cruce20', 'factor1', 'factor5', 'factor10', 'factor15', 'factor20',
       'factor25', 'factor30', 'factor35', 'factor40', 'factor45', 'factor50',
       'factor55', 'factor60', 'factor65', 'factor70', 'factor75', 'factor80',
       'factor85', 'factor90', 'factor95', 'factor100', 'factor105',
       'factor110', 'factor115', 'factor120'],
      dtype='object')

I make a heatmap plot of the correlation matrix

corr = my_df.diff().corr()

mask = np.zeros_like(corr)
mask[np.triu_indices_from(mask)] = True

sns.heatmap(corr, mask=mask, linewidths=0.1, vmax=1.0, 
                    square=False, cmap=colormap, linecolor='white')

with the next result:

Heatmap 1:

截图热图 1

But I want to keep only the different columns in the heatmap:

Heatmap 2:

截图热图 2

Is it possible to do it? And, if it is, can it be done by making the resulting square fill the blank space?

I solved it.

I had to change

corr = my_df.diff().corr()

To:

corr = df.diff().corr().filter(regex = 'cruce', axis=1).filter(regex = 'factor', axis=0)

The line

filter(regex = 'cruce', axis=1)

is used to remove all the columns that contain 'cruce' from the axis 1 (row-wise), while the line

filter(regex = 'factor', axis=0)

removes all columns that contain 'factor' from the axis 0 (column-wise).

More in pandas doc

And then, remove the mask settings:

sns.heatmap(corr, linewidths=0.1, vmax=1.0, square=False, cmap=colormap, linecolor='white')

And we have the next result: Solution

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM