按行创建相关矩阵：Pandas

Question

I want to create a correlation matrix by rows.我想按行创建一个相关矩阵。 Here's how my df looks like:这是我的 df 的样子：

df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
                   columns=['a', 'b', 'c'],index = ["doc1", "doc2", "doc3"])

#Output
      a  b  c
doc1  1  2  3
doc2  4  5  6
doc3  7  8  9

I want to find the correlation between documents.我想找到文档之间的相关性。 I used我用了

corrMatrix = df.corr()

but it gives me correlation between each cell (I think).但它给了我每个细胞之间的相关性（我认为）。 The other approach that I have considered is to simply subset each of the document and then use我考虑过的另一种方法是简单地对每个文档进行子集化，然后使用

np.corrcoef(doc1,doc2)

and manually create a 2D numpy array.并手动创建一个二维 numpy 阵列。 Any ideas where I can do this elegantly?有什么想法可以优雅地做到这一点吗？

Answer 1

DataFrame.corr() finds the correlation between pairs of columns . DataFrame.corr()查找列对之间的相关性。 If you want rows, transpose first.如果您想要行，请先转置。 (I modified your data slightly so everything isn't perfectly correlated) （我稍微修改了你的数据，所以一切都不是完全相关的）

import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([[1, 2, 8], [4, 5, 6], [5, 8, 9]]),
                  columns=['a', 'b', 'c'], index=["doc1", "doc2", "doc3"])

df.T.corr()

          doc1      doc2      doc3
doc1  1.000000  0.924473  0.782467
doc2  0.924473  1.000000  0.960769
doc3  0.782467  0.960769  1.000000

Or use np.corrcoef on the non-transposed DataFrame.或者在非转置 DataFrame 上使用np.corrcoef 。 This will be a lot faster than the above with a large DataFrame since you avoid the unnecessary transpose.这将比使用大 DataFrame 的上述方法快得多，因为您避免了不必要的转置。

np.corrcoef(df)

array([[1.        , 0.92447345, 0.78246663],
       [0.92447345, 1.        , 0.96076892],
       [0.78246663, 0.96076892, 1.        ]])

按行创建相关矩阵：Pandas

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-03-10 20:22:17

按行创建相关矩阵：Pandas

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-03-10 20:22:17

解决方案1
0 已采纳 2021-03-10 20:22:17