如何将高维数据框从长到宽重塑，以便后续降维+可视化？

Question

I have a data frame that resembles the following:我有一个类似于以下内容的数据框：

This looks like the following:如下所示：

index指数	attribute属性	score分数
user_1用户_1	a一个	0.144228 0.144228
user_1用户_1	b b	0.980685 0.980685
user_1用户_1	c c	0.165716 0.165716
user_2用户_2	a一个	0.795340 0.795340
user_2用户_2	b b	0.903498 0.903498
user_3用户_3	d d	0.193492 0.193492
user_3用户_3	e e	0.900509 0.900509

Here's the reproducible code:这是可重现的代码：

df = pd.DataFrame({'index':['user_1','user_1','user_1','user_2','user_2','user_3','user_3'],
                   'attribute':['a','b','c','a','b','d','e'],
              'score':[random.rand(),random.rand(),random.rand(),random.rand(),random.rand(),random.rand(),random.rand()]})


df.set_index('index',inplace=True)

I'd like to unstack/pivot this table so that the attribute values becomes column header, like so:我想取消堆叠/透视此表，以便属性值变为列 header，如下所示：

Now, this is fairly easy, except, I have 350K dimensions , and as you can see from the above example, not every user has scores for each dimension .现在，这相当容易，除了我有350K 个维度，从上面的例子中可以看出，并不是每个用户都有每个维度的分数。

I've tried using the standard pandas pd.pivot_table() and .unstack() functions, but my kernel invariably dies when I attempt to do so.我试过使用标准的 pandas pd.pivot_table()和.unstack()函数，但是当我尝试这样做时，我的 kernel 总是死机。 I subsequently attempted to do so using dask, saving the output to a csv via我随后尝试使用 dask 执行此操作，将 output 通过

dask.dataframe.reshape.pivot_table(df, index='index', columns='attribute', values='score').to_csv('df.csv')

but that crashed too, yielding the following error:但这也崩溃了，产生了以下错误：

KilledWorker: ("('pivot_table_count-chunk-c31649485f27d5f8670393d66e2d14ac', 0, 3, 0)", <Worker 'tcp://127.0.0.1:56298', name: 0, memory: 0, processing: 5>)

I'm currently at a loss.我目前不知所措。 How can I reshape high-dimensional dataset for subsequent dimension reduction, clustering, and viz?如何重塑高维数据集以进行后续降维、聚类和可视化？

Answer 1

df.pivot(index='index', columns='attribute').reset_index().droplevel(0, axis=1)


           attribute    a         b         c         d         e
0          user_1  0.144228  0.980685  0.165716       NaN       NaN
1          user_2  0.795340  0.903498       NaN       NaN       NaN
2          user_3       NaN       NaN       NaN  0.193492  0.900509

如何将高维数据框从长到宽重塑，以便后续降维+可视化？

问题描述

1 个解决方案

解决方案1
0 2021-02-27 20:59:01

如何将高维数据框从长到宽重塑，以便后续降维+可视化？

问题描述

1 个解决方案

解决方案1 0 2021-02-27 20:59:01

解决方案1
0 2021-02-27 20:59:01