[英]Removing rows from a data frame that are outside an interval [R]
I have a 1000 X 20 dataframe named newdf
.我有一个 1000 X 20 dataframe 名为
newdf
。 The goal is to go through every column and get the values within the 2.5% and 97.5% interval for every column.目标是通过每一列 go 并获得每列 2.5% 和 97.5% 区间内的值。 After that, if any feature has a value that goes outside of those intervals, we remove the ENTIRE row, regardless if any of the other features in that row are within that interval.
之后,如果任何特征的值超出这些间隔,我们将删除整行,无论该行中的任何其他特征是否在该间隔内。
So far, I have been able to create a for loop that stores all of the intervals into a list like so到目前为止,我已经能够创建一个 for 循环,将所有间隔存储到一个列表中,如下所示
for(i in 1:20){
quant <- quantile(newdf[,i], c(.025, .975))
new_list[[i]] <- quant
}
I need help finding away to apply these intervals over the 20 columns to then remove the rows.我需要帮助找到办法将这些间隔应用于 20 列,然后删除行。
I have been trying with if()
else()
functions with no success.我一直在尝试使用
if()
else()
函数但没有成功。
library(purrr)
idx_to_remove <- map_dfc(df, function(x) {
# for each column get interval
interval <- quantile(x, c(0.025, 0.975))
# generate boolean whether cell within interval
!(x >= interval[1] & x <= interval[2])
}) %>%
# for each row see if any TRUE
apply(1, any)
# remove these rows
df[-idx_to_remove, ]
set.seed(123)
df <- as.data.frame(matrix(rnorm(20 * 100), ncol = 20))
If I understand what you are trying to do, you can do something like this:如果我明白你想做什么,你可以这样做:
d %>% filter(
d %>%
mutate(across(v1:v20, ~between(.x, quantile(.x,0.025), quantile(.x, 0.975)))) %>%
rowwise() %>%
summarize(keep = all(c_across(v1:v20)))
)
Here, I'm filtering d on a logical vector, which is creating using mutate(across())
, where first each v1 through v20 itself becomes a logical vector (whether or not the value in that column is within that columns 0.025 to 0.975 bounds), and then we summarize over the rows using rowwise()
and c_across()
.. Ultimately keep
is a logical vector that is being fed to the initial filter()
call.在这里,我在一个逻辑向量上过滤 d,它是使用
mutate(across())
创建的,首先每个 v1 到 v20 本身都变成一个逻辑向量(无论该列中的值是否在该列 0.025 到 0.975 内边界),然后我们使用rowwise()
和c_across()
对行进行汇总。最终keep
是一个逻辑向量,它被提供给初始的filter()
调用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.