简体   繁体   English

为什么在我将.txt 文件转换为.hdf5 时,pandas 会添加额外的小数点?

[英]Why does pandas add an extra decimal point when I convert a .txt file to .hdf5?

Whenever my a part of my data is equal to exactly 0.05 it turns into sometimes 0.05.1 when I go from a.txt file to a.hdf5 file.每当我的数据的一部分正好等于 0.05 时,当我从 a.txt 文件到 a.hdf5 文件的 go 时,它有时会变成 0.05.1。 Here's the code:这是代码:

h_charge = pd.read_csv('/path/to/file.txt').to_hdf('/path/to/file.hdf5', key='data')

.txt .hdf5 In the images you can see that it goes from.05 in the.txt to.05.1 in the.hdf5, but earlier in the same file the.05 stays.05, and in other files also converted using this code I'm having the same problem. .txt .hdf5在图像中你可以看到它从.txt 中的.05 到.hdf5 中的.05.1,但在同一个文件中的更早的.05 仍然是.05,并且在其他文件中也使用此代码转换我有同样的问题。 Is this something I should just search and replace or is there a way to fix why this is happening?这是我应该搜索和替换的东西,还是有办法解决为什么会发生这种情况? Thanks!谢谢!

Edit: Here's my code for loading it in Jupyter using h5py:编辑:这是我使用 h5py 在 Jupyter 中加载它的代码:

ch=h5.File('/path/to/file.hdf5', 'r')
c = []
for n in ch['data']['axis0']:
      c.append(n.decode()) 

Gives the error: "ValueError: could not convert string to float: '0.05.1'"给出错误:“ValueError:无法将字符串转换为浮点数:'0.05.1'”

Start by verifying the values in the Pandas dataframe.首先验证 Pandas dataframe 中的值。 Assuming those are correct, you have to use HDF View (from The HDF Group) if you want to "see" the data in the h5 file.假设这些是正确的,如果您想“查看”h5 文件中的数据,则必须使用HDF View (来自 HDF Group)。
Checking the h5 file contents with h5py is complicated b/c Pandas default schema is complicated.使用 h5py 检查 h5 文件内容很复杂 b/c Pandas 默认模式很复杂。 Your key ( data ) is a group with multiple datasets: axis0, axis1, block#_items, block#_values (where # goes from 0->N - it is the dataframe column counter).您的键( data )是一个包含多个数据集的组: axis0, axis1, block#_items, block#_values (其中 # 从 0->N 开始 - 它是 dataframe 列计数器)。 So, to get the data you want, you need to read from ch['data']['block#_values'] where # is the appropriate column #.因此,要获取您想要的数据,您需要从ch['data']['block#_values']中读取 # 是相应的列 #。

Simple example below demonstrates the process.下面的简单示例演示了该过程。
Create some data with Pandas使用 Pandas 创建一些数据

import pandas as pd
dates = ['2021-08-01','2021-08-02','2021-08-03','2021-08-04','2021-08-05',
          '2021-08-06','2021-08-07','2021-08-08','2021-08-09','2021-08-10' ]
precip = [ 0.0, 0.02, 0.0, 0.12, 0.0,
            0.0, 1.11, 0.0, 0.0,  0.05]
df = pd.DataFrame({'dates': dates, 'precip': precip})

df.to_hdf('file_1.h5',key='data')

Reading data with h5py:使用 h5py 读取数据:

import h5py
with h5py.File('file_1.h5','r') as h5f:
    print(h5f['data']['axis0'][:])  # prints names
    print(h5f['data']['block0_values'][:]) # prints data for column 0

Output: Output:

[b'dates' b'precip']
[[0.  ]
 [0.02]
 [0.  ]
 [0.12]
 [0.  ]
 [0.  ]
 [1.11]
 [0.  ]
 [0.  ]
 [0.05]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将hdf5转换为.txt文件 - To convert hdf5 to .txt file 当使用“pandas.read_hdf()”读取巨大的 HDF5 文件时,为什么即使我通过指定块大小读取块,我仍然会收到 MemoryError? - When reading huge HDF5 file with “pandas.read_hdf() ”, why do I still get MemoryError even though I read in chunks by specifying chunksize? 我想使用 pandas groupby 将一个大的 csv 文件转换为 hdf5 文件 - I want to convert a large csv file into hdf5 file using pandas groupby hdf5 文件到 Pandas 数据框 - hdf5 file to pandas dataframe 如果我将多个空的熊猫系列放入hdf5中,为什么hdf5这么大? - Why if I put multiple empty Pandas series into hdf5 the size of hdf5 is so huge? 为什么当我将数据分割为具有 30 个不同键的 30 个较小的数据帧时,hdf5 文件的大小会急剧增加 - why does hdf5 file size increase dramatically when I segment the data to 30 smaller dataframes with 30 different keys 如何将属性添加到作为组存储在 HDF5 文件中的 pandas 数据框? - How to add attributes to a pandas dataframe that is stored as a group in a HDF5 file? 从熊猫HDF5文件加载时缺少值 - Values missing when loaded from Pandas HDF5 file 为什么只有在不修复 HDF5 弃用警告时才能处理大文件? - Why can I process a large file only when I don't fix HDF5 deprecation warning? Pandas 无法正确将 csv 转换为 hdf5? - Pandas cannot convert csv to hdf5 properly?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM