简体   繁体   English

如何在 Pandas/numpy 中将一系列数组转换为单个矩阵?

[英]how to convert a Series of arrays into a single matrix in pandas/numpy?

I somehow got a pandas.Series which contains a bunch of arrays in it, as the s in the code below.我以某种方式得到了一个包含一堆数组的pandas.Series ,如下面的代码中的s

data = [[1,2,3],[2,3,4],[3,4,5],[2,3,4],[3,4,5],[2,3,4],
        [3,4,5],[2,3,4],[3,4,5],[2,3,4],[3,4,5]]
s = pd.Series(data = data)
s.shape # output ---> (11L,)
# try to convert s to matrix
sm = s.as_matrix()
# but...
sm.shape # output ---> (11L,)

How can I convert the s into a matrix with shape (11,3)?如何将s转换为形状为 (11,3) 的矩阵? Thanks!谢谢!

If, for some reason, you have found yourself with that abomination of a Series , getting it back into the sort of matrix or array you want is straightforward: 如果出于某种原因,你发现自己对Series憎恶,那么将它恢复到你想要的那种matrixarray是很简单的:

In [16]: s
Out[16]:
0     [1, 2, 3]
1     [2, 3, 4]
2     [3, 4, 5]
3     [2, 3, 4]
4     [3, 4, 5]
5     [2, 3, 4]
6     [3, 4, 5]
7     [2, 3, 4]
8     [3, 4, 5]
9     [2, 3, 4]
10    [3, 4, 5]
dtype: object

In [17]: sm = np.matrix(s.tolist())

In [18]: sm
Out[18]:
matrix([[1, 2, 3],
        [2, 3, 4],
        [3, 4, 5],
        [2, 3, 4],
        [3, 4, 5],
        [2, 3, 4],
        [3, 4, 5],
        [2, 3, 4],
        [3, 4, 5],
        [2, 3, 4],
        [3, 4, 5]])

In [19]: sm.shape
Out[19]: (11, 3)

But unless it's something you can't change, having that Series makes little sense to begin with. 但除非它是你无法改变的东西,否则开始使用该系列毫无意义。

Another way is to extract the values of your series and use numpy.stack on them. 另一种方法是提取系列的值并对它们使用numpy.stack。

np.stack(s.values)

PS. PS。 I've run into similar situations often. 我经常遇到类似的情况。

对于pandas> = 0.24,您还可以使用np.stack(s.to_numpy())np.concatenate(s.to_numpy()) ,具体取决于您的要求。

I tested above methods with 5793 of 100D vectors.我用 5793 个 100D 向量测试了上述方法。 The old method, converting to list first, is fastest.先转换为列表的旧方法最快。

%time print(np.stack(df.features.values).shape)
%time print(np.stack(df.features.to_numpy()).shape)
%time print(np.array(df.features.tolist()).shape)
%time print(np.array(list(df.features)).shape)

Result结果

(5793, 100)
CPU times: user 11.7 ms, sys: 3.42 ms, total: 15.1 ms
Wall time: 22.7 ms
(5793, 100)
CPU times: user 11.1 ms, sys: 137 µs, total: 11.3 ms
Wall time: 11.9 ms
(5793, 100)
CPU times: user 5.96 ms, sys: 0 ns, total: 5.96 ms
Wall time: 6.91 ms
(5793, 100)
CPU times: user 5.74 ms, sys: 0 ns, total: 5.74 ms
Wall time: 6.43 ms

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM