如何在R中读取多个文件并从中创建单个数据帧？

Question

Objective 目的

I have 100 hdf5 files in a folder. 我的文件夹中有100个hdf5文件。 For a reproducible example let's consider only 2 files, namely: 对于可重现的示例，我们仅考虑2个文件，即：

> list.files(pattern="*.hdf5")
[1] "Cars_20160601_01.hdf5" "Cars_20160601_02.hdf5"

Each hdf5 file contains 2 groups, data and frame . 每个hdf5文件包含2组，即data和frame 。 I want to extract out 2 objects from data group. 我想从data组中提取2个对象。 These are called VDS_Veh_Speed and VDS_Chassis_CG_Position . 这些称为VDS_Veh_Speed和VDS_Chassis_CG_Position 。 Similarly, in the frame group there are 3 objects. 同样，在frame组中有3个对象。 Only the object frame is relevant in this group. 在该组中仅对象frame是相关的。
I want to read these files and extract the relevant variables described above. 我想阅读这些文件并提取上述相关变量。

What I tried: 我试过的

# Create a list all the hdf5 files
temp = list.files(pattern="*.hdf5")

# Read all files and create data frames from each using the file name as df name
for (i in unique(temp)){
  data <- h5read(file = i, name = "data") # ED data
  frame <- h5read(file = i, name = "frame") # Frame numbers
  ED <- data.frame(frames = frame$frame, 
                   speed.kph.ED = round(data$VDS_Veh_Speed*1.46667*0.3048*3.6,2),
                   pedal_pos = data$CFS_Accelerator_Pedal_Position)#fps

  df <- h5read(file = i, name = "data/VDS_Chassis_CG_Position")
  df <- as.data.frame(df)
  colnames(df) <- c("y", "x", "z")
  df$speed <- ED$speed.kph.ED 
  df$pedal_pos <- ED$pedal_pos
  df$file.ID <- i
  assign(i, df)
}

Now, because I have all the files in the Global environment, I removed the extra objects and only kept the new dfs: 现在，由于我具有全局环境中的所有文件，因此删除了多余的对象，只保留了新的dfs：

# Remove extra objects
rm(data, df, ED, frame, i, temp)

Finally, I made a list of the dfs in the environment and then created a single data frame: 最后，我列出了环境中的df，然后创建了一个数据框：

DF_obj <- lapply(ls(), get)
fdc <- do.call("rbind", DF_obj)

This works for me. 这对我有用。 But, as mentioned in the comments, assign should be avoided. 但是，如评论中所述，应避免assign 。 Also, I have to manually use rm() , without which this code won't work. 另外，我必须手动使用rm() ，否则该代码将无法工作。 Is there any way to avoid assign in this context? 有什么方法可以避免在这种情况下assign ？

If you need the data files, here is the link to the 2 mentioned above: https://1drv.ms/f/s!AsMFpkDhWcnw6g7StJp9dzZ-nCr4 如果您需要数据文件，则这里是上述2的链接： https : //1drv.ms/f/s!AsMFpkDhWcnw6g7StJp9dzZ-nCr4

Answer 1

The answer is basically the same as your code, but with a couple minor changes. 答案基本上与您的代码相同，但是有一些小的更改。 We just use a list and do normal assign to elements of the list rather than using assign() to create data frames in your global environment. 我们只使用一个列表并对列表中的元素进行常规分配，而不是在全局环境中使用assign()创建数据框。 This saves potential bugs, name clashes, and having to worry about extensive clean-up. 这样可以节省潜在的错误，名称冲突以及不必担心进行大量清理。

temp = list.files(pattern="*.hdf5")
df_list = list()  # initialize a list

# Read all files into a list of data frames
for (i in unique(temp)){
  data <- h5read(file = i, name = "data") # ED data
  frame <- h5read(file = i, name = "frame") # Frame numbers
  ED <- data.frame(frames = frame$frame, 
                   speed.kph.ED = round(data$VDS_Veh_Speed*1.46667*0.3048*3.6,2),
                   pedal_pos = data$CFS_Accelerator_Pedal_Position)#fps

  df <- h5read(file = i, name = "data/VDS_Chassis_CG_Position")
  df <- as.data.frame(df)
  colnames(df) <- c("y", "x", "z")
  df$speed <- ED$speed.kph.ED 
  df$pedal_pos <- ED$pedal_pos

  # assign to the list. We can take care of the id cols automatically
  df_list[[i]] <- df
} 

names(df) <- unique(temp)
fdc <- data.table::rbindlist(df_list, idcol = "file.ID")

Using data.table::rbindlist will be faster than using do.call(rbind) , and it takes care of the ID column for us based on the names of the list. 使用data.table::rbindlist会比使用do.call(rbind)更快，并且它会根据列表名称为我们处理ID列。

如何在R中读取多个文件并从中创建单个数据帧？

问题描述

Objective 目的

What I tried: 我试过的

1 个解决方案

解决方案1
2 已采纳 2016-09-27 20:44:52

如何在R中读取多个文件并从中创建单个数据帧？

问题描述

Objective 目的

What I tried: 我试过的

1 个解决方案

解决方案1 2 已采纳 2016-09-27 20:44:52

解决方案1
2 已采纳 2016-09-27 20:44:52