為什么在我將.txt 文件轉換為.hdf5 時，pandas 會添加額外的小數點？

Question

每當我的數據的一部分正好等於 0.05 時，當我從 a.txt 文件到 a.hdf5 文件的 go 時，它有時會變成 0.05.1。 這是代碼：

h_charge = pd.read_csv('/path/to/file.txt').to_hdf('/path/to/file.hdf5', key='data')

.txt .hdf5在圖像中你可以看到它從.txt 中的.05 到.hdf5 中的.05.1，但在同一個文件中的更早的.05 仍然是.05，並且在其他文件中也使用此代碼轉換我有同樣的問題。 這是我應該搜索和替換的東西，還是有辦法解決為什么會發生這種情況？ 謝謝！

編輯：這是我使用 h5py 在 Jupyter 中加載它的代碼：

ch=h5.File('/path/to/file.hdf5', 'r')
c = []
for n in ch['data']['axis0']:
      c.append(n.decode())

給出錯誤：“ValueError：無法將字符串轉換為浮點數：'0.05.1'”

Answer 1

首先驗證 Pandas dataframe 中的值。 假設這些是正確的，如果您想“查看”h5 文件中的數據，則必須使用HDF View （來自 HDF Group）。
使用 h5py 檢查 h5 文件內容很復雜 b/c Pandas 默認模式很復雜。 您的鍵（ data ）是一個包含多個數據集的組： axis0, axis1, block#_items, block#_values （其中 # 從 0->N 開始 - 它是 dataframe 列計數器）。 因此，要獲取您想要的數據，您需要從ch['data']['block#_values']中讀取 # 是相應的列 #。

下面的簡單示例演示了該過程。
使用 Pandas 創建一些數據

import pandas as pd
dates = ['2021-08-01','2021-08-02','2021-08-03','2021-08-04','2021-08-05',
          '2021-08-06','2021-08-07','2021-08-08','2021-08-09','2021-08-10' ]
precip = [ 0.0, 0.02, 0.0, 0.12, 0.0,
            0.0, 1.11, 0.0, 0.0,  0.05]
df = pd.DataFrame({'dates': dates, 'precip': precip})

df.to_hdf('file_1.h5',key='data')

使用 h5py 讀取數據：

import h5py
with h5py.File('file_1.h5','r') as h5f:
    print(h5f['data']['axis0'][:])  # prints names
    print(h5f['data']['block0_values'][:]) # prints data for column 0

Output：

[b'dates' b'precip']
[[0.  ]
 [0.02]
 [0.  ]
 [0.12]
 [0.  ]
 [0.  ]
 [1.11]
 [0.  ]
 [0.  ]
 [0.05]]

為什么在我將.txt 文件轉換為.hdf5 時，pandas 會添加額外的小數點？

問題描述

1 個解決方案

解決方案1
0 2022-08-17 16:22:25

為什么在我將.txt 文件轉換為.hdf5 時，pandas 會添加額外的小數點？

問題描述

1 個解決方案

解決方案1 0 2022-08-17 16:22:25

解決方案1
0 2022-08-17 16:22:25