简体   繁体   English

如何将 Python 字典转换为 Numpy 数组?

[英]How to convert a Python dictionary to a Numpy array?

So the logistic regression from the sklearn library from Python has the .fit() function which takes x_train (features) and y_train (labels) as arguments to train the classifier.因此,来自 Python 的 sklearn 库的逻辑回归具有.fit() function ,它将x_train (特征)和y_train (标签)作为 ZDBC11CAA5BDA99F77E6FB4DABD82 的分类器进行训练。

It seems that x_train.shape = (number_of_samples, number_of_features)似乎x_train.shape = (number_of_samples, number_of_features)

For x_train I should use the extracted xvector.scp file, which I am reading like so:对于 x_train 我应该使用提取的 xvector.scp 文件,我正在阅读如下:

b = kaldiio.load_scp('xvector.scp')

And I can print the content like so:我可以像这样打印内容:

for file_id in b:
  xvector = b[file_id]
  print(xvector)

Right now the b variable is like a dictionary and you can get the x-vector value of the corresponding id.现在 b 变量就像一个字典,你可以得到对应 id 的 x 向量值。 I want to use sklearn Logistic Regression to classify the x-vectors and in order to use the.fit() method I should pass an array as an argument.我想使用 sklearn Logistic Regression 对 x 向量进行分类,为了使用 .fit() 方法,我应该将数组作为参数传递。

My question is how can I make an array that contains only the xvector variables?我的问题是如何创建一个仅包含 xvector 变量的数组?

PS: the file_ids are like 1 million and each xvector has length of 512, which is too big for an array PS:file_ids 大约是 100 万,每个 xvector 的长度为 512,对于数组来说太大了

It seems you are trying to store the dictionary into a numpy array.您似乎正在尝试将字典存储到 numpy 数组中。 If the dictionary is small, you can directly store the values as:如果字典很小,您可以直接将值存储为:

import numpy as np

x = np.array(list(b.values()))

However, this will run into OOM issues if the dictionary is large.但是,如果字典很大,这将遇到 OOM 问题。 In this case, you would need to use np.memmap as explained here: https://ipython-books.github.io/48-processing-large-numpy-arrays-with-memory-mapping/在这种情况下,您需要使用np.memmap ,如下所述: https://ipython-books.github.io/48-processing-large-numpy-arrays-with-memory-mapping/

Essentially, you have to add rows to the array one at a time, and flush it when you have run out of memory.本质上,您必须一次向数组添加一行,并在 memory 用完时刷新它。 The array is stored directly on the disk, so it avoids OOM issues.阵列直接存储在磁盘上,因此可以避免 OOM 问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM