简体   繁体   中英

How to load cell array of strings in Matlab mat files into Python list or tuple using Scipy.io.loadmat

I am a Matlab user new to Python. I would like to write a cell array of strings in Matlab to a Mat file, and load this Mat file using Python (maybe scipy.io.loadmat) into some similar type (eg list of strings or tuple of strings). But loadmat read things into array and I am not sure how to convert it into a list. I tried the "tolist" function which does not work as I expected ( I have a poor understanding of Python array or numpy array). For example:

Matlab code:

cell_of_strings = {'thank',  'you', 'very', 'much'};
save('my.mat', 'cell_of_strings');

Python code:

matdata=loadmat('my.mat', chars_as_strings=1, matlab_compatible=1);
array_of_strings = matdata['cell_of_strings']

Then, the variable array_of_strings is:

array([[[[u't' u'h' u'a' u'n' u'k']], [[u'y' u'o' u'u']],
    [[u'v' u'e' u'r' u'y']], [[u'm' u'u' u'c' u'h']]]], dtype=object)

I am not sure how to convert this array_of_strings into a Python list or tuple so that it looks like

list_of_strings = ['thank',  'you', 'very', 'much'];

I am not familiar with the array object in Python or numpy. Your help will be highly appreciated.

Have your tried this:

import scipy.io as si

a = si.loadmat('my.mat')
b = a['cell_of_strings']                # type(b) <type 'numpy.ndarray'>
list_of_strings  = b.tolist()           # type(list_of_strings ) <type 'list'>

print list_of_strings 
# output: [u'thank', u'you', u'very', u'much']

This looks like a job for list comprehension . Repeating your example, I did this in MATLAB:

cell_of_strings = {'thank',  'you', 'very', 'much'};
save('my.mat', 'cell_of_strings','-v7'); 

I'm using a newer version of MATLAB, which saves .mat files in HDF5 format by default. loadmat can't read HDF5 files, so the '-v7' flag is to force MATLAB to save to an older version .mat file, which loadmat can understand.

In Python, I loaded the cell array just like you did:

import scipy.io as sio
matdata = sio.loadmat('%s/my.mat' %path, chars_as_strings=1, matlab_compatible=1);  
array_of_strings = matdata['cell_of_strings']

Printing array_of_strings gives:

[[array([[u't', u'h', u'a', u'n', u'k']], 
          dtype='<U1')
      array([[u'y', u'o', u'u']], 
          dtype='<U1')
      array([[u'v', u'e', u'r', u'y']], 
          dtype='<U1')
      array([[u'm', u'u', u'c', u'h']], 
          dtype='<U1')]]

The variable array_of_strings is a (1,4) numpy object array but there are arrays nested within each object. For example, the first element of array_of_strings is an (1,5) array containing the letters for 'thank'. That is,

array_of_strings[0,0]
array([[u't', u'h', u'a', u'n', u'k']], 
      dtype='<U1')

To get at the first letter 't', you have to do something like:

array_of_strings[0,0][0,0]
u't'

Since we are dealing with nested arrays, we need to employ some recursive technique to extract the data, ie nested for loops. But first, I'll show you how to extract the first word:

first_word = [str(''.join(letter)) for letter in array_of_strings[0][0]]
first_word
['thank']

Here I am using a list comprehension. Basically, I am looping through each letter in array_of_strings[0][0] and concatenating them using the ''.join method. The string() function is to convert the unicode strings into regular strings.

Now, to get the list strings you want, we just need to loop through each array of letters:

words = [str(''.join(letter)) for letter_array in array_of_strings[0] for letter in letter_array]
words
['thank', 'you', 'very', 'much']

List comprehensions take some getting used to, but they are extremely useful. Hope this helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM