简体   繁体   English

如何在R中读取多个文件并从中创建单个数据帧?

[英]How to read multiple files and create a single data frame from them in R?

Objective 目的

I have 100 hdf5 files in a folder. 我的文件夹中有100个hdf5文件。 For a reproducible example let's consider only 2 files, namely: 对于可重现的示例,我们仅考虑2个文件,即:

> list.files(pattern="*.hdf5")
[1] "Cars_20160601_01.hdf5" "Cars_20160601_02.hdf5"  

Each hdf5 file contains 2 groups, data and frame . 每个hdf5文件包含2组,即dataframe I want to extract out 2 objects from data group. 我想从data组中提取2个对象。 These are called VDS_Veh_Speed and VDS_Chassis_CG_Position . 这些称为VDS_Veh_SpeedVDS_Chassis_CG_Position Similarly, in the frame group there are 3 objects. 同样,在frame组中有3个对象。 Only the object frame is relevant in this group. 在该组中仅对象frame是相关的。
I want to read these files and extract the relevant variables described above. 我想阅读这些文件并提取上述相关变量。

What I tried: 我试过的

# Create a list all the hdf5 files
temp = list.files(pattern="*.hdf5")

# Read all files and create data frames from each using the file name as df name
for (i in unique(temp)){
  data <- h5read(file = i, name = "data") # ED data
  frame <- h5read(file = i, name = "frame") # Frame numbers
  ED <- data.frame(frames = frame$frame, 
                   speed.kph.ED = round(data$VDS_Veh_Speed*1.46667*0.3048*3.6,2),
                   pedal_pos = data$CFS_Accelerator_Pedal_Position)#fps

  df <- h5read(file = i, name = "data/VDS_Chassis_CG_Position")
  df <- as.data.frame(df)
  colnames(df) <- c("y", "x", "z")
  df$speed <- ED$speed.kph.ED 
  df$pedal_pos <- ED$pedal_pos
  df$file.ID <- i
  assign(i, df)
}  

Now, because I have all the files in the Global environment, I removed the extra objects and only kept the new dfs: 现在,由于我具有全局环境中的所有文件,因此删除了多余的对象,只保留了新的dfs:

# Remove extra objects
rm(data, df, ED, frame, i, temp)

Finally, I made a list of the dfs in the environment and then created a single data frame: 最后,我列出了环境中的df,然后创建了一个数据框:

DF_obj <- lapply(ls(), get)
fdc <- do.call("rbind", DF_obj)   

This works for me. 这对我有用。 But, as mentioned in the comments, assign should be avoided. 但是,如评论中所述,应避免assign Also, I have to manually use rm() , without which this code won't work. 另外,我必须手动使用rm() ,否则该代码将无法工作。 Is there any way to avoid assign in this context? 有什么方法可以避免在这种情况下assign

If you need the data files, here is the link to the 2 mentioned above: https://1drv.ms/f/s!AsMFpkDhWcnw6g7StJp9dzZ-nCr4 如果您需要数据文件,则这里是上述2的链接: https : //1drv.ms/f/s!AsMFpkDhWcnw6g7StJp9dzZ-nCr4

The answer is basically the same as your code, but with a couple minor changes. 答案基本上与您的代码相同,但是有一些小的更改。 We just use a list and do normal assign to elements of the list rather than using assign() to create data frames in your global environment. 我们只使用一个列表并对列表中的元素进行常规分配,而不是在全局环境中使用assign()创建数据框。 This saves potential bugs, name clashes, and having to worry about extensive clean-up. 这样可以节省潜在的错误,名称冲突以及不必担心进行大量清理。

temp = list.files(pattern="*.hdf5")
df_list = list()  # initialize a list

# Read all files into a list of data frames
for (i in unique(temp)){
  data <- h5read(file = i, name = "data") # ED data
  frame <- h5read(file = i, name = "frame") # Frame numbers
  ED <- data.frame(frames = frame$frame, 
                   speed.kph.ED = round(data$VDS_Veh_Speed*1.46667*0.3048*3.6,2),
                   pedal_pos = data$CFS_Accelerator_Pedal_Position)#fps

  df <- h5read(file = i, name = "data/VDS_Chassis_CG_Position")
  df <- as.data.frame(df)
  colnames(df) <- c("y", "x", "z")
  df$speed <- ED$speed.kph.ED 
  df$pedal_pos <- ED$pedal_pos

  # assign to the list. We can take care of the id cols automatically
  df_list[[i]] <- df
} 

names(df) <- unique(temp)
fdc <- data.table::rbindlist(df_list, idcol = "file.ID")

Using data.table::rbindlist will be faster than using do.call(rbind) , and it takes care of the ID column for us based on the names of the list. 使用data.table::rbindlist会比使用do.call(rbind)更快,并且它会根据列表名称为我们处理ID列。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将多个csv文件读取到R中的单个数据帧中? - How to read multiple csv files into a single data frame in R? R-如何从一个文件夹中读取多个文件,如何将它们转换为xts并对其进行一些数据分析? - R - How to read multiple files from a folder, convert them in xts and do some data analysis on them? 如何在R中读取文本文件并创建数据框 - how to read text files and create a data frame in R 在 R 如何从从标准输入读取值创建数据帧 - In R How to create data frame from Read values from stdin 如何从 R 中的多个 Excel 文件创建 3 维 data.frame - How to create a 3-Dimensional data.frame from multiple Excel files in R 如何在R中读取不同的.txt文件,但不能将它们加入同一data.frame中? - How can I read different .txt files in R but not join them in the same data.frame? 将多个文件中的数据合并到 R 中的单个 data.frame 中的计算速度最快的方法? - Computationally fastest way of merging data from multiple files into a single data.frame in R? 将包含多张工作表的多个 xlsx 文件读取到一个 R 数据框中 - Read multiple xlsx files with multiple sheets into one R data frame 如何从R中具有多个条件的一个数据框创建多个数据框 - How to create multiple data frame from one data frame with multiple condition in R 从 R 中的单个数据框中的成对 x 和 y 轴变量创建多个图 - create multiple plots from paired x and y axis variables from single data frame in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM