熊貓to_hdf成功，但隨后read_hdf失敗

Question

熊貓to_hdf成功，但是當我使用自定義對象作為列標題時， read_hdf失敗（我使用自定義對象，因為我需要在其中存儲其他信息）。

有什么辦法可以使這項工作嗎？ 還是只是Pandas錯誤或PyTables錯誤？

例如，在下面的示例中，我將首先顯示一個使用字符串列標題的DataFrame foo ，並且一切都可以通過to_hdf / read_hdf ，但隨后將foo更改為將自定義Col類用於列標題， to_hdf仍然可以正常工作，但隨后read_hdf引發斷言錯誤：

In [48]: foo = pd.DataFrame(np.random.randn(2, 3), columns = ['aaa', 'bbb', 'ccc'])

In [49]: foo
Out[49]: 
    aaa       bbb       ccc
0 -0.434303  0.174689  1.373971
1 -0.562228  0.862092 -1.361979

In [50]: foo.to_hdf('foo.h5', 'foo')

In [51]: bar = pd.read_hdf('foo.h5', 'foo')

In [52]: bar
Out[52]: 
    aaa       bbb       ccc
0 -0.434303  0.174689  1.373971
1 -0.562228  0.862092 -1.361979

In [52]: 

In [53]: class Col(object):
...:     def __init__(self, name, other_info):
...:         self.name = name
...:         self.other_info = other_info
...:     def __str__(self):
...:         return self.name
...:     

In [54]: foo = pd.DataFrame(np.random.randn(2, 3), columns = [Col('aaa', {'z': 5}), Col('bbb', {'y': True}), Col('ccc', {})])

In [55]: foo
Out[55]: 
    aaa       bbb       ccc
0 -0.830503  1.066178  1.057349
1  0.406967 -0.131430  1.970204

In [56]: foo.to_hdf('foo.h5', 'foo')

In [57]: bar = pd.read_hdf('foo.h5', 'foo')
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-57-888b061a1d2c> in <module>()
----> 1 bar = pd.read_hdf('foo.h5', 'foo')

/.../python3.4/site-packages/pandas/io/pytables.py in read_hdf(path_or_buf, key, **kwargs)
330 
331     try:
--> 332         return store.select(key, auto_close=auto_close, **kwargs)
333     except:
334         # if there is an error, close the store

/.../python3.4/site-packages/pandas/io/pytables.py in select(self, key, where, start, stop, columns, iterator, chunksize, auto_close, **kwargs)
672                            auto_close=auto_close)
673 
--> 674         return it.get_result()
675 
676     def select_as_coordinates(

/.../python3.4/site-packages/pandas/io/pytables.py in get_result(self, coordinates)
   1366 
   1367         # directly return the result
-> 1368         results = self.func(self.start, self.stop, where)
   1369         self.close()
   1370         return results

/.../python3.4/site-packages/pandas/io/pytables.py in func(_start, _stop, _where)
665             return s.read(start=_start, stop=_stop,
666                           where=_where,
--> 667                           columns=columns, **kwargs)
668 
669         # create the iterator

/.../python3.4/site-packages/pandas/io/pytables.py in read(self, **kwargs)
   2792             blocks.append(blk)
   2793 
-> 2794         return self.obj_type(BlockManager(blocks, axes))
   2795 
   2796     def write(self, obj, **kwargs):

/.../python3.4/site-packages/pandas/core/internals.py in __init__(self, blocks, axes, do_integrity_check, fastpath)
   2180         self._consolidate_check()
   2181 
-> 2182         self._rebuild_blknos_and_blklocs()
   2183 
   2184     def make_empty(self, axes=None):

/.../python3.4/site-packages/pandas/core/internals.py in _rebuild_blknos_and_blklocs(self)
   2271 
   2272         if (new_blknos == -1).any():
-> 2273             raise AssertionError("Gaps in blk ref_locs")
   2274 
   2275         self._blknos = new_blknos

AssertionError: Gaps in blk ref_locs

更新：

因此，Jeff回答了（a）“不支持此功能”和（b）“如果有元數據，則將其寫入屬性”。

關於（a）的問題1：我的列標題對象具有返回其屬性的方法，等等。例如，代替我必須解析出值的列標題字符串'x5y3z8'，我可以簡單地執行col_header.x（給出5）col_header.y（給出3）等。這是非常面向對象的和pythonic的，而不是使用字符串來存儲信息，並且每次都必須解析它來檢索信息。 您如何建議以一種不錯的方式替換當前的列標題對象（也支持）？

（順便說一句，您可能會看'x5y3z8'並認為層次結構索引有效，但事實並非如此，因為並非每個列標題都是'x＃y＃z＃'。我可能有一串字符串'foo'，另一列是字符串ints的“ bar5baz7”和浮點數的另一個“ x5y3z8”。列標題不一致。）

關於（a）的問題2：當您說它不被支持時，您是專門在談論to_hdf / read_hdf不支持它，還是您實際上是在說熊貓一般不支持它？ 如果僅缺少HDF5支持，那么我可以切換到其他將DataFrame保存到磁盤並使它工作的方法，對嗎？ 您預見到將來會出現任何問題嗎？ 例如，這是否會與to_pickle / read_pickle一起打破？ （我失去了表現，但不得不放棄一些，對吧？）

關於（b）的問題3：“如果您有元數據，則將其寫入屬性”是什么意思。 屬性是什么？ 一個簡單的例子將對我有很大幫助。 我是熊貓的新手。 謝謝！

Answer 1

這不是受支持的功能。

這將在下一個版本的熊貓（寫作中）中以format='table' 。 也應該fixed ，但是沒有實現。 根本不支持，也不可能支持。 您應該只使用字符串。 如果您有元數據，則將其寫入屬性。

熊貓to_hdf成功，但隨后read_hdf失敗

問題描述

1 個解決方案

解決方案1
0 2015-08-31 19:42:32

熊貓to_hdf成功，但隨后read_hdf失敗

問題描述

1 個解決方案

解決方案1 0 2015-08-31 19:42:32

解決方案1
0 2015-08-31 19:42:32