[英]Filter a data frame according to minimum and maximum values
I have a data frame like so: 我有一个像这样的数据框:
df
A B C D E F
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 24. 6. 16. 5. 1.20 6.
2 21. 2. 19. 2. 1.09 2.
3 12. 2. 12. 79. 0.860 2.
4 39. 7. 39. 39. 1.90 7.
5 51. 1. 82. 27. 2.30 1.
6 24. 9. 24. 40. 1.60 9.
7 48. 1. 32. 5. 1.60 1.
8 44. 1. 44. 12. 1.70 1.
9 14. 1. 18. 6. 0.880 1.
10 34. 2. 51. 5. 2.70 2.
# ... with 4,688 more rows
I would like to filter this data frame according to a list, such that for each column of df the minimum and maximum would be according to the minimum and maximum of the list Neighb: 我想根据列表过滤此数据框,这样对于每列df,最小值和最大值将根据列表Neighb的最小值和最大值:
[[1]]
[1] 15.7 15.9 16.0 16.1 16.2
[[2]]
[1] 0 1 2 3 4
[[3]]
[1] 15.0 15.3 16.0 16.3 16.5
[[4]]
[1] 3 4 5 6 7
[[5]]
[1] 1.08 1.09 1.10 1.11 1.12
[[6]]
[1] 0 1 2 3 4
Is there a way to do this efficiently with dplyr/base R? 有没有办法用dplyr / base R有效地做到这一点? Until now I used loops and filtered each column of df at a time
到目前为止,我一直使用循环并过滤每列df
We can use Map
from base R
我们可以使用
base R
Map
Map(function(x, y) x[x >= min(y) & x <= max(y)], df, Neighb)
#$A
#numeric(0)
#$B
#[1] 2 2 1 1 1 1 2
#$C
#[1] 16
#$D
#[1] 5 5 6 5
#$E
#[1] 1.09
#$F
#[1] 2 2 1 1 1 1 2
If we need to filter
the dataset based on the logical index, ie rows that have all TRUE
based on the comparison with 'Neighb' 如果我们需要根据逻辑索引
filter
数据集,即基于与'Neighb'的比较而全部为TRUE
的行
df[Reduce(`&`, Map(function(x, y) x >= min(y) & x <= max(y), df, Neighb)), ]
and if it is any TRUE 如果它是任何TRUE
df[Reduce(`|`, Map(function(x, y) x >= min(y) & x <= max(y), df, Neighb)),]
df <- structure(list(A = c(24, 21, 12, 39, 51, 24, 48, 44, 14, 34),
B = c(6, 2, 2, 7, 1, 9, 1, 1, 1, 2),
C = c(16, 19, 12, 39, 82, 24, 32, 44, 18, 51),
D = c(5, 2, 79, 39, 27, 40, 5, 12, 6, 5),
E = c(1.2, 1.09, 0.86, 1.9, 2.3, 1.6, 1.6, 1.7, 0.88, 2.7),
F = c(6, 2, 2, 7, 1, 9, 1, 1, 1, 2)),
.Names = c("A","B", "C", "D", "E", "F"),
class = "data.frame",
row.names = c(NA, -10L))
Neighb <- list(c(15.7, 15.9, 16.0, 16.1, 16.2),
c(0, 1, 2, 3, 4),
c(15.0, 15.3, 16.0, 16.3, 16.5),
c(3, 4, 5, 6, 7),
c(1.08, 1.09, 1.10, 1.11, 1.12),
c(0, 1, 2, 3, 4))
You can use map2
from purrr
together with between
from dplyr
to get the results you want. 你可以使用
purrr
map2
和purrr
between
的dplyr
来获得你想要的结果。
library(purrr)
library(dplyr)
map2(df, Neighb, function(x, y) x[between(x, min(y), max(y))] )
$A
numeric(0)
$B
[1] 2 2 1 1 1 1 2
$C
[1] 16
$D
[1] 5 5 6 5
$E
[1] 1.09
$F
[1] 2 2 1 1 1 1 2
data: 数据:
df <- structure(list(A = c(24, 21, 12, 39, 51, 24, 48, 44, 14, 34),
B = c(6, 2, 2, 7, 1, 9, 1, 1, 1, 2),
C = c(16, 19, 12, 39, 82, 24, 32, 44, 18, 51),
D = c(5, 2, 79, 39, 27, 40, 5, 12, 6, 5),
E = c(1.2, 1.09, 0.86, 1.9, 2.3, 1.6, 1.6, 1.7, 0.88, 2.7),
F = c(6, 2, 2, 7, 1, 9, 1, 1, 1, 2)),
.Names = c("A","B", "C", "D", "E", "F"),
class = "data.frame",
row.names = c(NA, -10L))
Neighb <- list(c(15.7, 15.9, 16.0, 16.1, 16.2),
c(0, 1, 2, 3, 4),
c(15.0, 15.3, 16.0, 16.3, 16.5),
c(3, 4, 5, 6, 7),
c(1.08, 1.09, 1.10, 1.11, 1.12),
c(0, 1, 2, 3, 4))
A possible solution: 可能的解决方案:
# needed packages
library(data.table)
# get the minimum and maximum for each list item
nr <- lapply(Neighb, range)
# create a matrix with the 'inrange' function from 'data.table'
m <- mapply(function(x, y) x %inrange% y, df, nr)
this gives: 这给了:
> m ABCDEF [1,] FALSE FALSE TRUE TRUE FALSE FALSE [2,] FALSE TRUE FALSE FALSE TRUE TRUE [3,] FALSE TRUE FALSE FALSE FALSE TRUE [4,] FALSE FALSE FALSE FALSE FALSE FALSE [5,] FALSE TRUE FALSE FALSE FALSE TRUE [6,] FALSE FALSE FALSE FALSE FALSE FALSE [7,] FALSE TRUE FALSE TRUE FALSE TRUE [8,] FALSE TRUE FALSE FALSE FALSE TRUE [9,] FALSE TRUE FALSE TRUE FALSE TRUE [10,] FALSE TRUE FALSE TRUE FALSE TRUE
Now you can filter df
with the rowSums
-function: 现在您可以使用
rowSums
过滤df
:
df[rowSums(m) == ncol(df),]
Applying this on the presented example data ( df
) will result in an empty dataframe, but on the original dataset will highly probably result in a non-empty dataframe. 将此应用于所呈现的示例数据(
df
)将导致空数据帧,但在原始数据集上很可能会导致非空数据帧。
Used data: 使用数据:
df <- read.table(text=" A B C D E F
1 24 6 16 5 1.20 6
2 21 2 19 2 1.09 2
3 12 2 12 79 0.860 2
4 39 7 39 39 1.90 7
5 51 1 82 27 2.30 1
6 24 9 24 40 1.60 9
7 48 1 32 5 1.60 1
8 44 1 44 12 1.70 1
9 14 1 18 6 0.880 1
10 34 2 51 5 2.70 2", header=TRUE, stringsAsFactors=FALSE)
Neighb <- list(c(15.7,15.9,16.0,16.1,16.2),c(0:4),c(15.0,15.3,16.0,16.3,16.5),c(3:7),seq(1.08,1.12,0.01),c(0:4))
Another approach could be 另一种方法可能是
#minimum and maximum value from given list
filter_criteria <- lapply(lookup_list, function(x) c(min(x), max(x)))
df1 <- as.data.frame(mapply(function(x, y) replace(x, !(x>=y[1] & x<=y[2]), NA),
df, filter_criteria))
df1
# A B C D E F
#1 NA NA 16 5 NA NA
#2 NA 2 NA NA 1.09 2
#3 NA 2 NA NA NA 2
#4 NA NA NA NA NA NA
#5 NA 1 NA NA NA 1
#6 NA NA NA NA NA NA
#7 NA 1 NA 5 NA 1
#8 NA 1 NA NA NA 1
#9 NA 1 NA 6 NA 1
#10 NA 2 NA 5 NA 2
#final output
df1 <- na.omit(df1) #as per given sample data it's empty
Sample data 样本数据
df <- structure(list(A = c(24, 21, 12, 39, 51, 24, 48, 44, 14, 34),
B = c(6, 2, 2, 7, 1, 9, 1, 1, 1, 2), C = c(16, 19, 12, 39,
82, 24, 32, 44, 18, 51), D = c(5, 2, 79, 39, 27, 40, 5, 12,
6, 5), E = c(1.2, 1.09, 0.86, 1.9, 2.3, 1.6, 1.6, 1.7, 0.88,
2.7), F = c(6, 2, 2, 7, 1, 9, 1, 1, 1, 2)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"))
lookup_list <- list(c(15.7, 15.9, 16, 16.1, 16.2), c(0, 1, 2, 3, 4), c(15, 15.3,
16, 16.3, 16.5), c(3, 4, 5, 6, 7), c(1.08, 1.09, 1.1, 1.11, 1.12
), c(0, 1, 2, 3, 4))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.