[英]Most efficient way to get index of a table from HDF5
I have an HDF5 file containing pandas Series/DataFrame tables.我有一个包含 pandas Series/DataFrame 表的 HDF5 文件。 I need to get (pandas) index of a table stored under a key in HDF, but not necessarily the whole table:我需要获取存储在 HDF 键下的表的(熊猫)索引,但不一定是整个表:
I can think of two (effectively the same) methods of getting the index:我可以想到两种(实际上相同)获取索引的方法:
import pandas as pd
hdfPath = 'c:/example.h5'
hdfKey = 'dfkey'
# way 1:
with pd.HDFStore(hdfPath) as hdf:
index = hdf[hdfKey].index
# way 2:
index = pd.read_hdf(hdfPath, hdfKey)
However for a pandas Series of ~2000 rows this takes 0.6 sec:然而,对于 ~2000 行的熊猫系列,这需要 0.6 秒:
%timeit pd.read_hdf(hdfPath, hdfKey).index
1 loops, best of 3: 605 ms per loop
Is there a way to get only index of a table in HDF?有没有办法只获取 HDF 中表的索引?
The HDFStore object has a select_column method that will allow you to get the index. HDFStore 对象有一个select_column方法,可以让您获取索引。 Note that it will return a Series with the index as the values.请注意,它将返回一个以索引为值的系列。
with pd.HDFStore(hdfPath) as hdf:
index = hdf.select_column(hdfKey, 'index').values
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.