[英]How to read multiple files and create a single data frame from them in R?
I have 100 hdf5 files in a folder. 我的文件夹中有100个hdf5文件。 For a reproducible example let's consider only 2 files, namely: 对于可重现的示例,我们仅考虑2个文件,即:
> list.files(pattern="*.hdf5")
[1] "Cars_20160601_01.hdf5" "Cars_20160601_02.hdf5"
Each hdf5 file contains 2 groups, data
and frame
. 每个hdf5文件包含2组,即data
和frame
。 I want to extract out 2 objects from data
group. 我想从data
组中提取2个对象。 These are called VDS_Veh_Speed
and VDS_Chassis_CG_Position
. 这些称为VDS_Veh_Speed
和VDS_Chassis_CG_Position
。 Similarly, in the frame
group there are 3 objects. 同样,在frame
组中有3个对象。 Only the object frame
is relevant in this group. 在该组中仅对象frame
是相关的。
I want to read these files and extract the relevant variables described above. 我想阅读这些文件并提取上述相关变量。
# Create a list all the hdf5 files
temp = list.files(pattern="*.hdf5")
# Read all files and create data frames from each using the file name as df name
for (i in unique(temp)){
data <- h5read(file = i, name = "data") # ED data
frame <- h5read(file = i, name = "frame") # Frame numbers
ED <- data.frame(frames = frame$frame,
speed.kph.ED = round(data$VDS_Veh_Speed*1.46667*0.3048*3.6,2),
pedal_pos = data$CFS_Accelerator_Pedal_Position)#fps
df <- h5read(file = i, name = "data/VDS_Chassis_CG_Position")
df <- as.data.frame(df)
colnames(df) <- c("y", "x", "z")
df$speed <- ED$speed.kph.ED
df$pedal_pos <- ED$pedal_pos
df$file.ID <- i
assign(i, df)
}
Now, because I have all the files in the Global environment, I removed the extra objects and only kept the new dfs: 现在,由于我具有全局环境中的所有文件,因此删除了多余的对象,只保留了新的dfs:
# Remove extra objects
rm(data, df, ED, frame, i, temp)
Finally, I made a list of the dfs in the environment and then created a single data frame: 最后,我列出了环境中的df,然后创建了一个数据框:
DF_obj <- lapply(ls(), get)
fdc <- do.call("rbind", DF_obj)
This works for me. 这对我有用。 But, as mentioned in the comments, assign
should be avoided. 但是,如评论中所述,应避免assign
。 Also, I have to manually use rm()
, without which this code won't work. 另外,我必须手动使用rm()
,否则该代码将无法工作。 Is there any way to avoid assign
in this context? 有什么方法可以避免在这种情况下assign
?
If you need the data files, here is the link to the 2 mentioned above: https://1drv.ms/f/s!AsMFpkDhWcnw6g7StJp9dzZ-nCr4 如果您需要数据文件,则这里是上述2的链接: https : //1drv.ms/f/s!AsMFpkDhWcnw6g7StJp9dzZ-nCr4
The answer is basically the same as your code, but with a couple minor changes. 答案基本上与您的代码相同,但是有一些小的更改。 We just use a list and do normal assign to elements of the list rather than using assign()
to create data frames in your global environment. 我们只使用一个列表并对列表中的元素进行常规分配,而不是在全局环境中使用assign()
创建数据框。 This saves potential bugs, name clashes, and having to worry about extensive clean-up. 这样可以节省潜在的错误,名称冲突以及不必担心进行大量清理。
temp = list.files(pattern="*.hdf5")
df_list = list() # initialize a list
# Read all files into a list of data frames
for (i in unique(temp)){
data <- h5read(file = i, name = "data") # ED data
frame <- h5read(file = i, name = "frame") # Frame numbers
ED <- data.frame(frames = frame$frame,
speed.kph.ED = round(data$VDS_Veh_Speed*1.46667*0.3048*3.6,2),
pedal_pos = data$CFS_Accelerator_Pedal_Position)#fps
df <- h5read(file = i, name = "data/VDS_Chassis_CG_Position")
df <- as.data.frame(df)
colnames(df) <- c("y", "x", "z")
df$speed <- ED$speed.kph.ED
df$pedal_pos <- ED$pedal_pos
# assign to the list. We can take care of the id cols automatically
df_list[[i]] <- df
}
names(df) <- unique(temp)
fdc <- data.table::rbindlist(df_list, idcol = "file.ID")
Using data.table::rbindlist
will be faster than using do.call(rbind)
, and it takes care of the ID column for us based on the names of the list. 使用data.table::rbindlist
会比使用do.call(rbind)
更快,并且它会根据列表名称为我们处理ID列。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.