简体   繁体   English

使用scipy在python中读取MatLab文件

[英]Reading MatLab files in python w/ scipy

I'm using python w/ scipy package to read the MatLab file. 我正在使用带w / scipy包的python来读取MatLab文件。

However it takes too long and crashes. 但是,它花费的时间太长并崩溃。

The Dataset is about 50~ MB in size 数据集的大小约为50〜MB

Is there any better way to read the data and form an edge list ? 有没有更好的方法来读取数据并形成边缘列表?

My python code 我的python代码

import scipy.io as io
data=io.loadmat('realitymining.mat')
print data

You could just save each field of the struct in a different text file, eg: 您可以只将结构的每个字段保存在不同的文本文件中,例如:

save('friends.txt', '-struct', 'network', 'friends', '-ascii')

and load each file separately from python 并分别从python加载每个文件

friends = numpy.loadtxt('friends.txt')

which loads instantly. 立即加载。

I can load it after unzipping. 解压缩后即可加载。 But it is stretching the memory. 但是它正在扩展记忆。

When I try to load it with octave I get: 当我尝试用octave加载它时,我得到:

octave:1> load realitymining.mat
error: memory exhausted or requested size too large for range of Octave's index type -- trying to return to prompt

In Ipython 在Ipython中

In [10]: data.keys()
Out[10]: ['network', 's', '__version__', '__header__', '__globals__']
In [14]: data['__header__']
Out[14]: 'MATLAB 5.0 MAT-file, Platform: MACI, Created on: Tue Sep 29 20:13:23 2009'
In [15]: data['s'].shape
Out[15]: (1, 106)
In [17]: data['s'].dtype
Out[17]: dtype([('comm', 'O'), ('charge', 'O'), ('active', 'O'), ('logtimes', 'O'),...  
   ('my_intros', 'O'), ('home_nights', 'O'), ('comm_local', 'O'), ('data_mat', 'O')])
# 58 fields
In [24]: data['s']['comm'][0,1].shape
Out[24]: (1, 30)
In [31]: data['s']['comm'][0,1][0,1]
Out[31]: ([[732338.8737731482]], [[355]], [[-1]], [u'Packet Data'], [u'Outgoing'], 
    [[40]], [[nan]])
In [33]: data['s']['comm'][0,1]['date']
Out[33]: 
array([[array([[ 732338.86915509]]), array([[ 732338.87377315]]),
    ...
    array([[ 732340.48579861]]), array([[ 732340.52778935]])]], dtype=object)

Look at the pieces. 看碎片。 Simply trying to print data or print data['s'] takes too long. 仅尝试print dataprint data['s']会花费太长时间。 Apparently it is just too big of structure to format quickly. 显然,它的结构太大而无法快速格式化。

To practically get at this data, I'd suggest loading it once in Python or Matlab, and then save the useful pieces to one or more files. 为了实际获得这些数据,我建议在Python或Matlab中将其加载一次,然后将有用的片段保存到一个或多个文件中。

Maybe you can first work on part of he data as the network in the struct, I have unpacked it here using MATLAB. 也许您可以首先将部分数据作为结构中的network进行处理,我在这里已使用MATLAB对其进行了拆包。

Still working on how to tidy up the rest bigger struct. 仍在研究如何整理其余的更大结构。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM