熊貓：如何總結熊貓交叉表/頻率矩陣

Question

我想總結一個交叉表/頻率矩陣的結果，它查看具有相同會話的用戶的頻率，結果是 4044 行 × 4044 列

UserID  10  50  30  2488  9416 23197            ... 
UserID                                                                                  
10      4   0   0   0   0   0   0   0   0   0   ... 0   0   0   0   0   0   0   0   0   0
50      0   48  2   9   4   0   0   0   0   0   ... 0   0   0   0   0   0   0   0   0   0
30      0   2   2   2   2   0   0   0   0   0   ... 0   0   0   0   0   0   0   0   0   0
2488    0   9   2   32  4   0   0   0   0   0   ... 0   0   0   0   0   0   0   0   0   0
9416    0   4   2   4   4   0   0   0   0   0   ... 0   0   0   0   0   0   0   0

有沒有辦法總結它以獲得用戶到用戶的匹配數，例如：

UserID  UserID Occurance
10       50      2
30       2488    5
23197    10      3
30       50      1

Answer 1

解決方案：

創建一個布爾掩碼，從起始矩陣中選擇上三角非對角線值
將此掩碼（使用.reshape() ）和原始矩陣（使用.stack() ） .stack()為相同大小的列向量
使用布爾索引來挑選您需要的行。

例子：

import pandas as pd
import numpy as np

np.random.seed(1)

# Example data
df = pd.DataFrame(pd.np.random.randint(0, 4, size=(5, 5)), 
                  index=[10, 50, 30, 2488, 9416], 
                  columns=[10, 50, 30, 2488, 9416])

# Quick and dirty method to make the example data symmetric
df = df + df.T

df
      10    50    30    2488  9416
10       2     4     0     0     5
50       4     6     2     5     1
30       0     2     0     4     3
2488     0     5     4     4     0
9416     5     1     3     0     6

# To select the upper-triangular, non-diagonal entries,
# take a *lower*-triangular mask, np.tril, 
# and negate it with ~.
mask = (~np.tril(np.ones(df.shape)).astype('bool'))
mask
array([[False,  True,  True,  True,  True],
       [False, False,  True,  True,  True],
       [False, False, False,  True,  True],
       [False, False, False, False,  True],
       [False, False, False, False, False]])

# Prepare to select rows from the stacked df
mask = mask.reshape(df.size)

# Stack the columns of the starting matrix into a MultiIndex, 
# which results in a MultiIndexed Series;
# select the upper-triangular off-diagonal rows;
# reset the MultiIndex levels into columns
df.stack()[mask].reset_index().rename({'level_0': 'UserID_row', 
                                       'level_1': 'UserID_col', 
                                       0: 'Occurrence'}, axis=1)
   UserID_row  UserID_col  Occurrence
0          10          50           4
1          10          30           0
2          10        2488           0
3          10        9416           5
4          50          30           2
5          50        2488           5
6          50        9416           1
7          30        2488           4
8          30        9416           3
9        2488        9416           0

熊貓：如何總結熊貓交叉表/頻率矩陣

問題描述

1 個解決方案

解決方案1
1 已采納 2018-11-03 21:25:42

熊貓：如何總結熊貓交叉表/頻率矩陣

問題描述

1 個解決方案

解決方案1 1 已采納 2018-11-03 21:25:42

解決方案1
1 已采納 2018-11-03 21:25:42