简体   繁体   English

从 HDF5 获取表索引的最有效方法

[英]Most efficient way to get index of a table from HDF5

I have an HDF5 file containing pandas Series/DataFrame tables.我有一个包含 pandas Series/DataFrame 表的 HDF5 文件。 I need to get (pandas) index of a table stored under a key in HDF, but not necessarily the whole table:我需要获取存储在 HDF 键下的表的(熊猫)索引,但不一定是整个表:

I can think of two (effectively the same) methods of getting the index:我可以想到两种(实际上相同)获取索引的方法:

import pandas as pd

hdfPath = 'c:/example.h5'
hdfKey = 'dfkey'
# way 1:
with pd.HDFStore(hdfPath) as hdf:
    index = hdf[hdfKey].index

# way 2:
index = pd.read_hdf(hdfPath, hdfKey)

However for a pandas Series of ~2000 rows this takes 0.6 sec:然而,对于 ~2000 行的熊猫系列,这需要 0.6 秒:

%timeit pd.read_hdf(hdfPath, hdfKey).index
1 loops, best of 3: 605 ms per loop

Is there a way to get only index of a table in HDF?有没有办法只获取 HDF 中表的索引?

The HDFStore object has a select_column method that will allow you to get the index. HDFStore 对象有一个select_column方法,可以让您获取索引。 Note that it will return a Series with the index as the values.请注意,它将返回一个以索引为值的系列。

with pd.HDFStore(hdfPath) as hdf:
    index = hdf.select_column(hdfKey, 'index').values

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 有没有更有效的方法从 hdf5 数据集中检索批次? - Is there a more efficient way of retrieving batches from a hdf5 dataset? 使用 Python 查找 HDF5 文件中特定值的所有路径的最有效方法是什么? - What is the most efficient way to find all paths to particular values in HDF5 file with Python? 读取包含存储为 numpy 数组的图像的 hdf5 文件的最有效方法是什么? - What is the most efficient way to read an hdf5 file containing an image stored as a numpy array? Pandas通过索引从HDF5获取特定行 - Pandas get specific rows from HDF5 by index 查找HDF5表中列中每个唯一值的上一个时间戳的有效方法 - efficient way to find last time stamp for each unique value in a column in HDF5 table 快速有效的从HDF5文件序列化和检索大量numpy数组的方法 - Fast and efficient way of serializing and retrieving a large number of numpy arrays from HDF5 file 有没有办法从 HDF5 数据集中删除行? - Is there a way of removing rows from a HDF5 dataset? 从字典中获取键的最有效方法 - Most efficient way to get a key from a dictionary HDF5 Python - 处理来自多个进程的读取的正确方法? - HDF5 Python - Correct way to handle reads from multiple processes? 有没有办法将 R 包中的数据帧保存为 hdf5 以加载到 python 中? - Is there a way to save a dataframes from an R package as an hdf5 to load into python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM