简体   繁体   English

如何用Python读取包含卫星数据的H5文件?

[英]How to read a H5 file containing satellite data in Python?

As part of a project I'm exploring satellite data and the data is available in H5 format.作为项目的一部分,我正在探索卫星数据,数据以 H5 格式提供。 I'm new to this format and I'm unable to process the data.我是这种格式的新手,无法处理数据。 I'm able to open the file in a software called Panoply and found that the DHI value is available in a format called Geo2D.我能够在名为 Panoply 的软件中打开该文件,发现 DHI 值以一种名为 Geo2D 的格式提供。 Is there anyway to extract the data into a CSV format as shown below:无论如何将数据提取为 CSV 格式,如下所示:

X X Y GHI GHI
X1 X1 Y1 Y1
X2 X2 Y2 Y2

Attaching screenshots of the file opened in Panoply alongside.附上在 Panoply 中打开的文件的屏幕截图。

Link to the file: https://drive.google.com/file/d/1xQHNgrlrbyNcb6UyV36xh-7zTfg3f8OQ/view文件链接: https ://drive.google.com/file/d/1xQHNgrlrbyNcb6UyV36xh-7zTfg3f8OQ/view

I tried the following code to read the data.我尝试了以下代码来读取数据。 I'm able to store it as a 2d numpy array, but unable to do it along with the location.我可以将它存储为 2d numpy 数组,但无法与位置一起存储。

` `

import h5py
import numpy as np
import pandas as pd
import geopandas as gpd


#%%
f = h5py.File('mer.h5', 'r')

for key in f.keys():
    print(key) #Names of the root level object names in HDF5 file - can be groups or datasets.
    print(type(f[key])) # get the object type: usually group or dataset
    ls = list(f.keys())
   


key ='X'


masterdf=pd.DataFrame()


data = f.get(key)   
dataset1 = np.array(data)
masterdf = dataset1


np.savetxt("FILENAME.csv",dataset1, delimiter=",")


#masterdf.to_csv('new.csv')

enter image description here在此处输入图像描述

enter image description here `在此处输入图像描述`

Found an effective way to read the data, convert it to a dataframe and convert the projection parameters.找到了一种读取数据、将其转换为数据框并转换投影参数的有效方法。

Code is tracked here: https://github.com/rishikeshsreehari/boring-stuff-with-python/blob/main/data-from-hdf5-file/final_converter.py代码在此处跟踪: https ://github.com/rishikeshsreehari/boring-stuff-with-python/blob/main/data-from-hdf5-file/final_converter.py

Code is as follows:代码如下:

import pandas as pd
import h5py
import time
from pyproj import Proj, transform


input_epsg=24378
output_epsg=4326

start_time = time.time()


with h5py.File("mer.h5", "r") as file:
    df_X = pd.DataFrame(file.get("X")[:-2], columns=["X"])
    df_Y = pd.DataFrame(file.get("Y"), columns=["Y"])
    DHI = file.get("DHI")[0][:, :-2].reshape(-1)

final = df_Y.merge(df_X, how="cross").assign(DHI=DHI)[["X", "Y", "DHI"]]



final['X2'],final['Y2']=transform(input_epsg,output_epsg,final[["X"]].to_numpy(),final[["Y"]].to_numpy(),always_xy=True)


#final.to_csv("final_converted1.csv", index=False)

print("--- %s seconds ---" % (time.time() - start_time))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM