[英]Can't load HDF5 files in Python
I'm having a weird problem with reading hdf5 files. 我在读取hdf5文件时遇到了一个奇怪的问题。 This is with the latest version of Python and Numpy, as well as other packages. 这是最新版本的Python和Numpy,以及其他软件包。 I'm on Ubuntu 18.0.4. 我在Ubuntu 18.0.4上。 I can write hdf5 files just fine, but when I try to read hdf5 files, it throws this error: 我可以很好地写hdf5文件,但是当我尝试读取hdf5文件时,它将引发此错误:
ValueError Traceback (most recent call last)
<ipython-input-1-f757457e8111> in <module>
2 df = pd.DataFrame(data={'d': ['a', 'b', 'c']})
3 df.to_hdf('test.hdf', key='data')
----> 4 df = pd.read_hdf('test.hdf')
~/numpy_bug/lib/python3.7/site-packages/pandas/io/pytables.py in read_hdf(path_or_buf, key, mode, **kwargs)
392 'contains multiple datasets.')
393 key = candidate_only_group._v_pathname
--> 394 return store.select(key, auto_close=auto_close, **kwargs)
395 except:
396 # if there is an error, close the store
~/numpy_bug/lib/python3.7/site-packages/pandas/io/pytables.py in select(self, key, where, start, stop, columns, iterator, chunksize, auto_close, **kwargs)
739 chunksize=chunksize, auto_close=auto_close)
740
--> 741 return it.get_result()
742
743 def select_as_coordinates(
~/numpy_bug/lib/python3.7/site-packages/pandas/io/pytables.py in get_result(self, coordinates)
1481
1482 # directly return the result
-> 1483 results = self.func(self.start, self.stop, where)
1484 self.close()
1485 return results
~/numpy_bug/lib/python3.7/site-packages/pandas/io/pytables.py in func(_start, _stop, _where)
732 return s.read(start=_start, stop=_stop,
733 where=_where,
--> 734 columns=columns)
735
736 # create the iterator
~/numpy_bug/lib/python3.7/site-packages/pandas/io/pytables.py in read(self, start, stop, **kwargs)
2935 blk_items = self.read_index('block%d_items' % i)
2936 values = self.read_array('block%d_values' % i,
-> 2937 start=_start, stop=_stop)
2938 blk = make_block(values,
2939 placement=items.get_indexer(blk_items))
~/numpy_bug/lib/python3.7/site-packages/pandas/io/pytables.py in read_array(self, key, start, stop)
2487
2488 if isinstance(node, tables.VLArray):
-> 2489 ret = node[0][start:stop]
2490 else:
2491 dtype = getattr(attrs, 'value_type', None)
~/numpy_bug/lib/python3.7/site-packages/tables/vlarray.py in __getitem__(self, key)
679 key += self.nrows
680 (start, stop, step) = self._process_range(key, key + 1, 1)
--> 681 return self.read(start, stop, step)[0]
682 elif isinstance(key, slice):
683 start, stop, step = self._process_range(
~/numpy_bug/lib/python3.7/site-packages/tables/vlarray.py in read(self, start, stop, step)
819 listarr = []
820 else:
--> 821 listarr = self._read_array(start, stop, step)
822
823 atom = self.atom
tables/hdf5extension.pyx in tables.hdf5extension.VLArray._read_array()
ValueError: cannot set WRITEABLE flag to True of this array
This appears to be line 2021 in the _read_array() function here . 这似乎是在_read_array()函数线2021 这里 。
Minimal reproducing code: 最少的复制代码:
import pandas as pd
df = pd.DataFrame(data={'d': ['a', 'b', 'c']})
df.to_hdf('test.hdf', key='data')
df = pd.read_hdf('test.hdf')
This turns out to be a bug in the tables library, which has since been fixed: https://github.com/PyTables/PyTables/issues/719 事实证明这是表库中的错误,此错误已得到修复: https : //github.com/PyTables/PyTables/issues/719
Upgrade your pytables library with conda update pytables
or pip --upgrade pytables
and it should be good to go. 使用conda conda update pytables
或pip --upgrade pytables
升级pytables库,应该很好。
Old way of fixing it before the new release for reference: 在新版本发布之前修复它的旧方法,以供参考:
Once there is a new release, it should be fixed. 一旦有新版本,就应该修复。 Until then, you have to stick with numpy version 1.15.4 by doing: 在此之前,您必须坚持使用numpy 1.15.4版,方法是:
pip install numpy==1.15.4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.