[英]Assign an ID vector to a dataframe in R, based on filename?
介紹:
我有一個充滿來自傳感器網絡數據的目錄。 我想使用位於文件名中的每個傳感器的序列號來創建ID向量。 。
這是一些文件名示例:
2017-07-18-32058-aqdata.csv
2017-07-18-32033-aqdata.csv
每個傳感器的序列號都在時間戳之后,例如32058
或32033
。
這是我當前讀取數據的方式:
## Load the necessary packages:
if (!require(plyr)){
install.packages('plyr')
library(plyr)
}
if (!require(dplyr)){
install.packages('dplyr')
library(dplyr)
}
## Define a function to read in a single file:
read.data <- function(file_path){
df <- read.csv(file_path, header=TRUE, stringsAsFactors=FALSE)
names(df)<-c("datetime", "co", "co2", "VOC", "RH","ugm3","temp")
df$datetime <- strptime(df$datetime, format="%Y-%m-%d %H:%M")
df$datetime <- as.POSIXct(df$datetime, format="%Y-%m-%d %H:%M:%S")
return(df)
}
## Assign object 'file_path' to my target directory:
file_path <-"~/my_directory/"
## Generate a list of files within this directory:
file_list <- list.files(path = file_path, pattern="\\.csv$", all.files=FALSE, full.names=TRUE, ignore.case=FALSE)
## Apply the data.read function to the list of files:
df_master <- dplyr::ldply(file_list, read.data)
df_master <- plyr::arrange(df_master, datetime)
如何利用每個文件名中的序列號在read.data()
函數中創建相應的ID向量?
這是一些示例數據:
df_example <- structure(list(datetime = structure(c(1497296520, 1497296580, 1497296640, 1497296700, 1497296760, 1497296820), class = c("POSIXct", "POSIXt"), tzone = ""), co = c(0, 0, 0, 0, 0, 0), co2 = c(1118L, 1508L, 836L, 620L, 529L, 498L), VOC = c(62.1353, 59.7594, 59.1831, 57.9592, 56.4335, 53.6528), RH = c(51.45, 52.18, 50.72, 49.71, 49.21, 48.51), ugm3 = c(2.601, 1.061, 1.901, 1.481, 2.501, 3.261), temp = c(72.27, 72.35, 72.45, 72.55, 72.67, 72.77)), .Names = c("datetime", "co", "co2", "VOC", "RH", "ugm3", "temp"), row.names = c(NA, 6L), class = "data.frame")
提前致謝!
這假設您的傳感器編號都是5個以上的數字,這有助於避免與日期混淆。 使用stringr
:
library(stringr)
read.data <- function(file_path){
df <- read.csv(file_path, header=TRUE, stringsAsFactors=FALSE)
names(df)<-c("datetime", "co", "co2", "VOC", "RH","ugm3","temp")
df$datetime <- strptime(df$datetime, format="%Y-%m-%d %H:%M")
df$datetime <- as.POSIXct(df$datetime, format="%Y-%m-%d %H:%M:%S")
# New code to pull in sensor number
df$sensor <- str_extract(file_path, "[0-9]{5,}")
return(df)
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.