[英]Using data in one data.frame to generate values for a new column in another data.frame in R
I have two dataframes, one which contains a timestamp and air_temperature 我有两个数据框,其中一个包含时间戳和air_temperature
air_temp time_stamp
85.1 1396335600
85.4 1396335860
And another, which contains startTime, endTime, location coordinates, and a canonical name. 另一个包含startTime,endTime,位置坐标和规范名称。
startTime endTime location.lat location.lon name
1396334278 1396374621 37.77638 -122.4176 Work
1396375256 1396376369 37.78391 -122.4054 Work
For each row in the first data frame, I want to identify which time range in the second data frame it lies in, ie if the timestamp 1396335600, is between the startTime 1396334278, and endTime 1396374621, add the location and name value to the row in the first data.frame. 对于第一个数据帧中的每一行,我想确定它位于第二个数据帧中的哪个时间范围,即,如果时间戳1396335600在startTime 1396334278和endTime 1396374621之间,则将位置和名称值添加到该行中在第一个data.frame中。
The start and end time in the second data frame don't overlap, and are linearly increasing. 第二个数据帧中的开始时间和结束时间不重叠,并且线性增加。 However they are not perfectly continuous, so if the timestamp falls between two time bands, I need to mark the location as NA.
但是它们不是完全连续的,因此,如果时间戳介于两个时间段之间,则需要将该位置标记为NA。 If it does fit between the start and end times, I want to add the location.lat, location.lon, and name columns to the first data frame.
如果它确实适合开始时间和结束时间之间,我想将location.lat,location.lon和name列添加到第一个数据帧。
Appreciate your help. 感谢您的帮助。
Try this. 尝试这个。 Not tested.
未经测试。
newdata <- data2[data1$timestamp>=data2$startTime & data1$timestamp<=data2$endTime ,3:5]
data1 <- cbind(data1[data1$timestamp>=data2$startTime & data1$timestamp<=data2$endTime,],newdata)
This won't return any values if timestamp isn't between startTime and endTime, so in theory your returned dataset could be shorter than the original. 如果时间戳不在startTime和endTime之间,则此方法将不返回任何值,因此从理论上讲,您返回的数据集可能比原始数据集短。 Just in case I treated data1 with the same TRUE FALSE vector as data2 so they will be the same length.
以防万一我用与data2相同的TRUE FALSE向量处理data1,所以它们的长度相同。
rowidx <- sapply(dfrm1$time_stamp, function(x) which( dfrm2$startTime <= x & dfrm2$endTime >= x)
cbind(dfrm1$time_stamp. dfrm2[ rwoidx, c("location.lat","location.lon","name")]
Mine's not test either and looks substantially similar to CCurtis, so give him the check if it works. 我的也没有进行测试,并且看上去与CCurtis基本相似,因此请给他检查是否可行。
Interesting problem... Turned out to be more complicated than I originally thought!! 有趣的问题……原来比我原先想象的还要复杂!! Step1: Set up the data!
第一步:设置数据!
DF1 <- read.table(text="air_temp time_stamp
85.1 1396335600
85.4 1396335860",header=TRUE)
DF2 <- read.table(text="startTime endTime location.lat location.lon name
1396334278 1396374621 37.77638 -122.4176 Work
1396375256 1396376369 37.78391 -122.4054 Work",header=TRUE)
Step2: For each time_stamp
in DF1
compute appropriate index
in DF2
: 步骤2:对于
DF1
每个time_stamp
,在DF2
计算适当的index
:
index <- sapply(DF1$time_stamp,
function(i) {
dec <- which(i >= DF2$startTime & i <= DF2$endTime)
ifelse(length(dec) == 0, NA, dec)
}
)
index
Step3: Merge the two data frames: 第三步:合并两个数据帧:
DF1 <- cbind(DF1,DF2[index,3:5])
row.names(DF1) <- 1:nrow(DF1)
DF1
Hope this helps!! 希望这可以帮助!!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.