从 HDF5 获取表索引的最有效方法

Question

I have an HDF5 file containing pandas Series/DataFrame tables.我有一个包含 pandas Series/DataFrame 表的 HDF5 文件。 I need to get (pandas) index of a table stored under a key in HDF, but not necessarily the whole table:我需要获取存储在 HDF 键下的表的（熊猫）索引，但不一定是整个表：

I can think of two (effectively the same) methods of getting the index:我可以想到两种（实际上相同）获取索引的方法：

import pandas as pd

hdfPath = 'c:/example.h5'
hdfKey = 'dfkey'
# way 1:
with pd.HDFStore(hdfPath) as hdf:
    index = hdf[hdfKey].index

# way 2:
index = pd.read_hdf(hdfPath, hdfKey)

However for a pandas Series of ~2000 rows this takes 0.6 sec:然而，对于 ~2000 行的熊猫系列，这需要 0.6 秒：

%timeit pd.read_hdf(hdfPath, hdfKey).index
1 loops, best of 3: 605 ms per loop

Is there a way to get only index of a table in HDF?有没有办法只获取 HDF 中表的索引？

Answer 1

The HDFStore object has a select_column method that will allow you to get the index. HDFStore 对象有一个select_column方法，可以让您获取索引。 Note that it will return a Series with the index as the values.请注意，它将返回一个以索引为值的系列。

with pd.HDFStore(hdfPath) as hdf:
    index = hdf.select_column(hdfKey, 'index').values

从 HDF5 获取表索引的最有效方法

问题描述

1 个解决方案

解决方案1
2 2016-07-17 18:13:52

从 HDF5 获取表索引的最有效方法

问题描述

1 个解决方案

解决方案1 2 2016-07-17 18:13:52

解决方案1
2 2016-07-17 18:13:52