简体   繁体   English

基于满足不同标准范围的 2 列保留 dataframe 中的行; dataframe中有27行范围

[英]Retain rows in a dataframe based on 2 columns satisfying different criteria ranges; there are 27 rows of ranges in the dataframe

I have a dataframe dat which I need to extract the entire row based on 2 columns ut and ctz which must satisfy their respective rows of ranges in data.frame range_criteria simultaneously;我有一个 dataframe数据,我需要根据utctz两列提取整行,这两个列必须同时满足 data.frame range_criteria中各自的范围行; the ranges are different for ut and ctz and they must satisfy their respective ranges . utctz的范围不同,它们必须满足各自的范围 If either the ut or ctz is out of the range, the entire row will be discarded.如果utctz超出范围,则将丢弃整行。

I have been cracking my brain over this for for 12 hours, I must make sure each row of ut and dat is checked by every row of the respective range_criteria .我已经为此绞尽脑汁了 12 个小时,我必须确保utdat每一行都由相应 range_criteria 的每一行检查。 I know I have to loop, but I am not sure how... please help!我知道我必须循环,但我不确定如何......请帮忙!

dat <- data.frame(name = c('Tom', "Harry", "David", "Daniel", "Harri", "Davidi", "Daniely"),
             ut = c(2.4, 3.2, 3.5,9.5,5.2,6.0,45),
             ctz = c(1, 6.0, 3.5, 5.1, 51.5, 6.6, 7))

range_criteria <- data.frame(ut_min = c(0.0, 0.5, 1.0, 2.0, 7.2, 9.0, 21.0),
    ut_max = c(5, 10, 15, 25, 30, 35, 50),
    ctz_min = c(0, 1, 2, 3.2, 4.3, 6.3, 6.9),
    ctz_max = c(5, 5.5, 6.1, 6.2, 6.4 ,6.5, 7.8)

The expected outcome should be:预期结果应该是:

interest <- data.frame(name = c('Tom', "David", "Daniely" ),
                 ut = c(2.4, 3.5,45),
                 ctz = c(1, 3.5, 7))

Thank you so much !!太感谢了 !!

Based on your description it sounds like you want the i th row of dat to satisfy both ranges specified in the i th row of range_criteria , is that correct?根据您的描述,听起来您希望第idat满足第irange_criteria中指定的两个范围,对吗?

If so, there's no need to loop (explicitly).如果是这样,则无需循环(明确地)。 R's vectorized approach makes this work pretty easily: R 的矢量化方法使这项工作非常容易:

dat <- data.frame(name = c('Tom', "Harry", "David", "Daniel", "Harri", "Davidi", "Daniely"),
                  ut = c(2.4, 3.2, 3.5,9.5,5.2,6.0,45),
                  ctz = c(1, 6.0, 3.5, 5.1, 51.5, 6.6, 7))

rc <- data.frame(ut_min = c(0.0, 0.5, 1.0, 2.0, 7.2, 9.0, 21.0),
                             ut_max = c(5, 10, 15, 25, 30, 35, 50),
                             ctz_min = c(0, 1, 2, 3.2, 4.3, 6.3, 6.9),
                             ctz_max = c(5, 5.5, 6.1, 6.2, 6.4 ,6.5, 7.8))

dat[dat$ut >= rc$ut_min & dat$ut <= rc$ut_max & dat$ctz >= rc$ctz_min & dat$ctz <= rc$ctz_max,]

This also returns "Daniel" in addition to the other three names you mentioned, but looking at the data I think that's correct.除了您提到的其他三个名称之外,这还返回“丹尼尔”,但查看数据我认为这是正确的。

Alternately you could use a package designed for data manipulation like dplyr or data.table to do the same thing a bit more smoothly.或者,您可以使用为dplyrdata.table等数据操作而设计的 package 来更顺利地完成相同的操作。

library(data.table)

both <- cbind(dat, rc)
setDT(both)
interest <- both[between(ut, ut_min, ut_max) & between(ctz, ctz_min, ctz_max)]

or或者

library(dplyr)

both <- bind_cols(dat, rc)

interest <- both %>%
  filter(ut >= ut_min & ut <= ut_max & ctz >= ctz_min & ctz <= ctz_max)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM