简体   繁体   中英

Why does pandas add an extra decimal point when I convert a .txt file to .hdf5?

Whenever my a part of my data is equal to exactly 0.05 it turns into sometimes 0.05.1 when I go from a.txt file to a.hdf5 file. Here's the code:

h_charge = pd.read_csv('/path/to/file.txt').to_hdf('/path/to/file.hdf5', key='data')

.txt .hdf5 In the images you can see that it goes from.05 in the.txt to.05.1 in the.hdf5, but earlier in the same file the.05 stays.05, and in other files also converted using this code I'm having the same problem. Is this something I should just search and replace or is there a way to fix why this is happening? Thanks!

Edit: Here's my code for loading it in Jupyter using h5py:

ch=h5.File('/path/to/file.hdf5', 'r')
c = []
for n in ch['data']['axis0']:
      c.append(n.decode()) 

Gives the error: "ValueError: could not convert string to float: '0.05.1'"

Start by verifying the values in the Pandas dataframe. Assuming those are correct, you have to use HDF View (from The HDF Group) if you want to "see" the data in the h5 file.
Checking the h5 file contents with h5py is complicated b/c Pandas default schema is complicated. Your key ( data ) is a group with multiple datasets: axis0, axis1, block#_items, block#_values (where # goes from 0->N - it is the dataframe column counter). So, to get the data you want, you need to read from ch['data']['block#_values'] where # is the appropriate column #.

Simple example below demonstrates the process.
Create some data with Pandas

import pandas as pd
dates = ['2021-08-01','2021-08-02','2021-08-03','2021-08-04','2021-08-05',
          '2021-08-06','2021-08-07','2021-08-08','2021-08-09','2021-08-10' ]
precip = [ 0.0, 0.02, 0.0, 0.12, 0.0,
            0.0, 1.11, 0.0, 0.0,  0.05]
df = pd.DataFrame({'dates': dates, 'precip': precip})

df.to_hdf('file_1.h5',key='data')

Reading data with h5py:

import h5py
with h5py.File('file_1.h5','r') as h5f:
    print(h5f['data']['axis0'][:])  # prints names
    print(h5f['data']['block0_values'][:]) # prints data for column 0

Output:

[b'dates' b'precip']
[[0.  ]
 [0.02]
 [0.  ]
 [0.12]
 [0.  ]
 [0.  ]
 [1.11]
 [0.  ]
 [0.  ]
 [0.05]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM