简体   繁体   English

如何使用R读取文件夹中的所有hdf文件?

[英]How can I read all hdf files in a folder using R?

I have thousands of hdf files in a folder. 我在一个文件夹中有数千个hdf文件。 Is there a way to create a loop to read all of the hdf files in that folder and write some specific data to another file? 有没有一种方法可以创建一个循环来读取该文件夹中的所有hdf文件并将某些特定数据写入另一个文件?

I read the first file in the folder using the code below: 我使用以下代码读取了文件夹中的第一个文件:

mydata <- h5read("/path to file/name of the file.he5", "/HDFEOS/GRIDS/Northern Hemisphere/Data Fields/SWE_NorthernDaily")

But I have 1686 more files in the folder, and it is not possible to read one by one. 但是我的文件夹中还有1686个文件,无法一一读取。 I think I need to write a for loop to read all files in the folder. 我想我需要编写一个for循环来读取文件夹中的所有文件。

I found some codes listing the txt files in a folder and then, read all the files: 我发现一些代码列出了文件夹中的txt文件,然后读取了所有文件:

nm <- list.files(path="path/to/file")
do.call(rbind, lapply(nm, function(x) read.table(file=x)[, 2]))

I tried to change the code as seen below: 我试图更改代码,如下所示:

nm <- list.files(path="path/to/file")
do.call(rbind, lapply(nm, function(x) h5read(file=x)[, 2]))

But the error message says: 但是错误消息显示:

Error in h5checktypeOrOpenLoc(file, readonly = TRUE, native = native) : Error in h5checktypeOrOpenLoc(). h5checktypeOrOpenLoc(file,readonly = TRUE,native = native)中的错误:h5checktypeOrOpenLoc()中的错误。 Cannot open file. 不能打开文件。 File 'D:\\path to file\\name of the file.he5' does not exist. 文件'D:\\到文件的路径\\文件名.he5'不存在。

What should I do in that situation? 在那种情况下我该怎么办?

If you are not bound to a specific technology, you may want to take a look at HDFql . 如果您不受特定技术的限制,则可能需要看一下HDFql Using HDFql in R, your issue can be solved as follows (for the sake of this example, assume that (1) dataset /HDFEOS/GRIDS/Northern Hemisphere/Data Fields/SWE_NorthernDaily exists in all the HDF5 files stored in the directory, (2) it has one dimension (size 1024), and (3) is of data type integer): 在R中使用HDFql,可以按以下方式解决您的问题(就本例而言,假设(1)数据集/HDFEOS/GRIDS/Northern Hemisphere/Data Fields/SWE_NorthernDaily存在于目录中存储的所有HDF5文件中,( 2)它具有一个维度(大小为1024),而(3)的数据类型为整数):

# load HDFql R wrapper (make sure it can be found by the R interpreter)
source("HDFql.R")

# create variable "values" and initialize it
values <- array(dim = c(1024))
for(x in 1:1024)
{
    values[x] <- as.integer(0)
}

# show (i.e. get) files stored in directory "/path/to/hdf5/files" and populate HDFql default cursor with it
hdfql_execute("SHOW FILE /path/to/hdf5/files")

# iterate HDFql default cursor
while(hdfql_cursor_next() == HDFQL_SUCCESS)
{
    file_name <- hdfql_cursor_get_char()

    # select (i.e. read) data from dataset "/HDFEOS/GRIDS/Northern Hemisphere/Data Fields/SWE_NorthernDaily" and populate variable "values" with it
    hdfql_execute(paste("SELECT FROM", file_name, "\"/HDFEOS/GRIDS/Northern Hemisphere/Data Fields/SWE_NorthernDaily\" INTO MEMORY", hdfql_variable_transient_register(values)))

    # display values stored in variable "values"
    for(x in 1:1024)
    {
        print(values[x])
    }
}

Additional examples on how to read datasets using HDFql can be found in the quick start guide and reference manual . 快速入门指南参考手册中提供了有关如何使用HDFql读取数据集的其他示例。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM