如何将 Python 字典转换为 Numpy 数组？

Question

So the logistic regression from the sklearn library from Python has the .fit() function which takes x_train (features) and y_train (labels) as arguments to train the classifier.因此，来自 Python 的 sklearn 库的逻辑回归具有.fit() function ，它将x_train （特征）和y_train （标签）作为 ZDBC11CAA5BDA99F77E6FB4DABD82 的分类器进行训练。

It seems that x_train.shape = (number_of_samples, number_of_features)似乎x_train.shape = (number_of_samples, number_of_features)

For x_train I should use the extracted xvector.scp file, which I am reading like so:对于 x_train 我应该使用提取的 xvector.scp 文件，我正在阅读如下：

b = kaldiio.load_scp('xvector.scp')

And I can print the content like so:我可以像这样打印内容：

for file_id in b:
  xvector = b[file_id]
  print(xvector)

Right now the b variable is like a dictionary and you can get the x-vector value of the corresponding id.现在 b 变量就像一个字典，你可以得到对应 id 的 x 向量值。 I want to use sklearn Logistic Regression to classify the x-vectors and in order to use the.fit() method I should pass an array as an argument.我想使用 sklearn Logistic Regression 对 x 向量进行分类，为了使用 .fit() 方法，我应该将数组作为参数传递。

My question is how can I make an array that contains only the xvector variables?我的问题是如何创建一个仅包含 xvector 变量的数组？

PS: the file_ids are like 1 million and each xvector has length of 512, which is too big for an array PS：file_ids 大约是 100 万，每个 xvector 的长度为 512，对于数组来说太大了

Answer 1

It seems you are trying to store the dictionary into a numpy array.您似乎正在尝试将字典存储到 numpy 数组中。 If the dictionary is small, you can directly store the values as:如果字典很小，您可以直接将值存储为：

import numpy as np

x = np.array(list(b.values()))

However, this will run into OOM issues if the dictionary is large.但是，如果字典很大，这将遇到 OOM 问题。 In this case, you would need to use np.memmap as explained here: https://ipython-books.github.io/48-processing-large-numpy-arrays-with-memory-mapping/在这种情况下，您需要使用np.memmap ，如下所述： https://ipython-books.github.io/48-processing-large-numpy-arrays-with-memory-mapping/

Essentially, you have to add rows to the array one at a time, and flush it when you have run out of memory.本质上，您必须一次向数组添加一行，并在 memory 用完时刷新它。 The array is stored directly on the disk, so it avoids OOM issues.阵列直接存储在磁盘上，因此可以避免 OOM 问题。

如何将 Python 字典转换为 Numpy 数组？

问题描述

1 个解决方案

解决方案1
0 2021-06-17 14:23:45

如何将 Python 字典转换为 Numpy 数组？

问题描述

1 个解决方案

解决方案1 0 2021-06-17 14:23:45

解决方案1
0 2021-06-17 14:23:45