简体   繁体   English

从数据框中删除区间外的行 [R]

[英]Removing rows from a data frame that are outside an interval [R]

I have a 1000 X 20 dataframe named newdf .我有一个 1000 X 20 dataframe 名为newdf The goal is to go through every column and get the values within the 2.5% and 97.5% interval for every column.目标是通过每一列 go 并获得每列 2.5% 和 97.5% 区间内的值。 After that, if any feature has a value that goes outside of those intervals, we remove the ENTIRE row, regardless if any of the other features in that row are within that interval.之后,如果任何特征的值超出这些间隔,我们将删除整行,无论该行中的任何其他特征是否在该间隔内。

So far, I have been able to create a for loop that stores all of the intervals into a list like so到目前为止,我已经能够创建一个 for 循环,将所有间隔存储到一个列表中,如下所示

for(i in 1:20){
  quant <- quantile(newdf[,i], c(.025, .975))
  new_list[[i]] <- quant
  
}

I need help finding away to apply these intervals over the 20 columns to then remove the rows.我需要帮助找到办法将这些间隔应用于 20 列,然后删除行。

I have been trying with if() else() functions with no success.我一直在尝试使用if() else()函数但没有成功。

library(purrr)    
idx_to_remove <- map_dfc(df, function(x) {
        # for each column get interval
        interval <- quantile(x, c(0.025, 0.975))
    
        # generate boolean whether cell within interval
        !(x >= interval[1] & x <= interval[2])
    }) %>% 

    # for each row see if any TRUE
    apply(1, any)

# remove these rows
df[-idx_to_remove, ]
Input Data used使用的输入数据
set.seed(123)
df <- as.data.frame(matrix(rnorm(20 * 100), ncol = 20))

If I understand what you are trying to do, you can do something like this:如果我明白你想做什么,你可以这样做:


d %>% filter(
  d %>% 
  mutate(across(v1:v20, ~between(.x, quantile(.x,0.025), quantile(.x, 0.975)))) %>%
  rowwise() %>% 
  summarize(keep = all(c_across(v1:v20)))
)

Here, I'm filtering d on a logical vector, which is creating using mutate(across()) , where first each v1 through v20 itself becomes a logical vector (whether or not the value in that column is within that columns 0.025 to 0.975 bounds), and then we summarize over the rows using rowwise() and c_across() .. Ultimately keep is a logical vector that is being fed to the initial filter() call.在这里,我在一个逻辑向量上过滤 d,它是使用mutate(across())创建的,首先每个 v1 到 v20 本身都变成一个逻辑向量(无论该列中的值是否在该列 0.025 到 0.975 内边界),然后我们使用rowwise()c_across()对行进行汇总。最终keep是一个逻辑向量,它被提供给初始的filter()调用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM