[英]Retain rows in a dataframe based on 2 columns satisfying different criteria ranges; there are 27 rows of ranges in the dataframe
I have a dataframe dat which I need to extract the entire row based on 2 columns ut and ctz which must satisfy their respective rows of ranges in data.frame range_criteria simultaneously;我有一个 dataframe数据,我需要根据ut和ctz两列提取整行,这两个列必须同时满足 data.frame range_criteria中各自的范围行; the ranges are different for ut and ctz and they must satisfy their respective ranges .
ut和ctz的范围不同,它们必须满足各自的范围。 If either the ut or ctz is out of the range, the entire row will be discarded.
如果ut或ctz超出范围,则将丢弃整行。
I have been cracking my brain over this for for 12 hours, I must make sure each row of ut and dat is checked by every row of the respective range_criteria .我已经为此绞尽脑汁了 12 个小时,我必须确保ut和dat的每一行都由相应 range_criteria 的每一行检查。 I know I have to loop, but I am not sure how... please help!
我知道我必须循环,但我不确定如何......请帮忙!
dat <- data.frame(name = c('Tom', "Harry", "David", "Daniel", "Harri", "Davidi", "Daniely"),
ut = c(2.4, 3.2, 3.5,9.5,5.2,6.0,45),
ctz = c(1, 6.0, 3.5, 5.1, 51.5, 6.6, 7))
range_criteria <- data.frame(ut_min = c(0.0, 0.5, 1.0, 2.0, 7.2, 9.0, 21.0),
ut_max = c(5, 10, 15, 25, 30, 35, 50),
ctz_min = c(0, 1, 2, 3.2, 4.3, 6.3, 6.9),
ctz_max = c(5, 5.5, 6.1, 6.2, 6.4 ,6.5, 7.8)
The expected outcome should be:预期结果应该是:
interest <- data.frame(name = c('Tom', "David", "Daniely" ),
ut = c(2.4, 3.5,45),
ctz = c(1, 3.5, 7))
Thank you so much !!太感谢了 !!
Based on your description it sounds like you want the i
th row of dat
to satisfy both ranges specified in the i
th row of range_criteria
, is that correct?根据您的描述,听起来您希望第
i
行dat
满足第i
行range_criteria
中指定的两个范围,对吗?
If so, there's no need to loop (explicitly).如果是这样,则无需循环(明确地)。 R's vectorized approach makes this work pretty easily:
R 的矢量化方法使这项工作非常容易:
dat <- data.frame(name = c('Tom', "Harry", "David", "Daniel", "Harri", "Davidi", "Daniely"),
ut = c(2.4, 3.2, 3.5,9.5,5.2,6.0,45),
ctz = c(1, 6.0, 3.5, 5.1, 51.5, 6.6, 7))
rc <- data.frame(ut_min = c(0.0, 0.5, 1.0, 2.0, 7.2, 9.0, 21.0),
ut_max = c(5, 10, 15, 25, 30, 35, 50),
ctz_min = c(0, 1, 2, 3.2, 4.3, 6.3, 6.9),
ctz_max = c(5, 5.5, 6.1, 6.2, 6.4 ,6.5, 7.8))
dat[dat$ut >= rc$ut_min & dat$ut <= rc$ut_max & dat$ctz >= rc$ctz_min & dat$ctz <= rc$ctz_max,]
This also returns "Daniel" in addition to the other three names you mentioned, but looking at the data I think that's correct.除了您提到的其他三个名称之外,这还返回“丹尼尔”,但查看数据我认为这是正确的。
Alternately you could use a package designed for data manipulation like dplyr or data.table to do the same thing a bit more smoothly.或者,您可以使用为dplyr或data.table等数据操作而设计的 package 来更顺利地完成相同的操作。
library(data.table)
both <- cbind(dat, rc)
setDT(both)
interest <- both[between(ut, ut_min, ut_max) & between(ctz, ctz_min, ctz_max)]
or或者
library(dplyr)
both <- bind_cols(dat, rc)
interest <- both %>%
filter(ut >= ut_min & ut <= ut_max & ctz >= ctz_min & ctz <= ctz_max)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.