简体   繁体   中英

How to convert a Python dictionary to a Numpy array?

So the logistic regression from the sklearn library from Python has the .fit() function which takes x_train (features) and y_train (labels) as arguments to train the classifier.

It seems that x_train.shape = (number_of_samples, number_of_features)

For x_train I should use the extracted xvector.scp file, which I am reading like so:

b = kaldiio.load_scp('xvector.scp')

And I can print the content like so:

for file_id in b:
  xvector = b[file_id]
  print(xvector)

Right now the b variable is like a dictionary and you can get the x-vector value of the corresponding id. I want to use sklearn Logistic Regression to classify the x-vectors and in order to use the.fit() method I should pass an array as an argument.

My question is how can I make an array that contains only the xvector variables?

PS: the file_ids are like 1 million and each xvector has length of 512, which is too big for an array

It seems you are trying to store the dictionary into a numpy array. If the dictionary is small, you can directly store the values as:

import numpy as np

x = np.array(list(b.values()))

However, this will run into OOM issues if the dictionary is large. In this case, you would need to use np.memmap as explained here: https://ipython-books.github.io/48-processing-large-numpy-arrays-with-memory-mapping/

Essentially, you have to add rows to the array one at a time, and flush it when you have run out of memory. The array is stored directly on the disk, so it avoids OOM issues.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM