[英]Extract values from a dataframe based on a range of values from another dataframe
I am trying to extract the index values from a dataframe ( df1
) that represent a range of times (start - end) and that encompass the times given in another dataframe ( df2
). 我正在尝试从一个数据帧(
df1
)中提取索引值,该索引值表示一个时间范围(start-end)并包含另一个数据帧( df2
)中给出的时间。 My required output is df3
. 我需要的输出是
df3
。
df1<-data.frame(index=c(1,2,3,4),start=c(5,10,15,20),end=c(10,15,20,25))
df2<-data.frame(time=c(11,17,18,5,5,22))
df3<-data.frame(time=c(11,17,18,5,5,22),index=c(2,3,3,1,1,4))
Is there a tidyverse solution to this? 有解决这个问题的方法吗?
You can do it with R base functions. 您可以使用R基本功能来实现。 A combination of
which
inside sapply
and logical comparison will do the work for you. 的组合
which
内部sapply
和逻辑比较会做的工作适合你。
inds <- apply(df1[,-1], 1, function(x) seq(from=x[1], to=x[2]))
index <- sapply(df2$time, function(x){
tmp <- which(x == inds, arr.ind = TRUE);
tmp[, "col"]
} )
df3 <- data.frame(df2, index)
df3
time index
1 11 2
2 17 3
3 18 3
4 5 1
5 5 1
6 8 1
Data: 数据:
df1<-data.frame(index=c(1,2,3,4),start=c(5,10,15,20),end=c(10,15,20,25))
df2<-data.frame(time=c(11,17,18,2,5,5,8,22))
Code: 码:
# get index values and assign it to df2 column
df2$index <- apply( df2, 1, function(x) { with(df1, index[ x[ 'time' ] >= start & x[ 'time' ] <= end ] ) })
Output: 输出:
df2
# time index
# 1 11 2
# 2 17 3
# 3 18 3
# 4 2
# 5 5 1
# 6 5 1
# 7 8 1
# 8 22 4
Here is one option with findInterval
这是
findInterval
一个选项
ftx <- function(x, y) findInterval(x, y)
df3 <- transform(df2, index = pmax(ftx(time, df1$start), ftx(time, df1$end)))
df3
# time index
#1 11 2
#2 17 3
#3 18 3
#4 5 1
#5 5 1
#6 22 4
Or another option is foverlaps
from data.table
或者另一种选择是
foverlaps
从data.table
library(data.table)
dfN <- data.table(index = seq_len(nrow(df2)), start = df2$time, end = df2$time)
setDT(df1)
setkey(dfN, start, end)
setkey(df1, start, end)
foverlaps(dfN, df1, which = TRUE)[, yid[match(xid, dfN$index)]]
#[1] 2 3 3 1 1 4
As the OP commented about using a solution with pipes, @Jilber Urbina's solution can be implemented with tidyverse
functions 正如OP所评论的使用管道解决方案一样,@ Jilber Urbina的解决方案可以使用
tidyverse
函数来实现
library(tidyverse)
df1 %>%
select(from = start, to = end) %>%
pmap(seq) %>%
do.call(cbind, .) %>%
list(.) %>%
mutate(df2, new = .,
ind = map2(time, new, ~ which(.x == .y, arr.ind = TRUE)[,2])) %>%
select(-new)
# time ind
#1 11 2
#2 17 3
#3 18 3
#4 5 1
#5 5 1
#6 22 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.