从数据框中删除区间外的行 [R]

Question

I have a 1000 X 20 dataframe named newdf .我有一个 1000 X 20 dataframe 名为newdf 。 The goal is to go through every column and get the values within the 2.5% and 97.5% interval for every column.目标是通过每一列 go 并获得每列 2.5% 和 97.5% 区间内的值。 After that, if any feature has a value that goes outside of those intervals, we remove the ENTIRE row, regardless if any of the other features in that row are within that interval.之后，如果任何特征的值超出这些间隔，我们将删除整行，无论该行中的任何其他特征是否在该间隔内。

So far, I have been able to create a for loop that stores all of the intervals into a list like so到目前为止，我已经能够创建一个 for 循环，将所有间隔存储到一个列表中，如下所示

for(i in 1:20){
  quant <- quantile(newdf[,i], c(.025, .975))
  new_list[[i]] <- quant
  
}

I need help finding away to apply these intervals over the 20 columns to then remove the rows.我需要帮助找到办法将这些间隔应用于 20 列，然后删除行。

I have been trying with if() else() functions with no success.我一直在尝试使用if() else()函数但没有成功。

Answer 1

library(purrr)    
idx_to_remove <- map_dfc(df, function(x) {
        # for each column get interval
        interval <- quantile(x, c(0.025, 0.975))
    
        # generate boolean whether cell within interval
        !(x >= interval[1] & x <= interval[2])
    }) %>% 

    # for each row see if any TRUE
    apply(1, any)

# remove these rows
df[-idx_to_remove, ]

Input Data used使用的输入数据

set.seed(123)
df <- as.data.frame(matrix(rnorm(20 * 100), ncol = 20))

Answer 2

If I understand what you are trying to do, you can do something like this:如果我明白你想做什么，你可以这样做：


d %>% filter(
  d %>% 
  mutate(across(v1:v20, ~between(.x, quantile(.x,0.025), quantile(.x, 0.975)))) %>%
  rowwise() %>% 
  summarize(keep = all(c_across(v1:v20)))
)

Here, I'm filtering d on a logical vector, which is creating using mutate(across()) , where first each v1 through v20 itself becomes a logical vector (whether or not the value in that column is within that columns 0.025 to 0.975 bounds), and then we summarize over the rows using rowwise() and c_across() .. Ultimately keep is a logical vector that is being fed to the initial filter() call.在这里，我在一个逻辑向量上过滤 d，它是使用mutate(across())创建的，首先每个 v1 到 v20 本身都变成一个逻辑向量（无论该列中的值是否在该列 0.025 到 0.975 内边界），然后我们使用rowwise()和c_across()对行进行汇总。最终keep是一个逻辑向量，它被提供给初始的filter()调用。

从数据框中删除区间外的行 [R]

问题描述

2 个解决方案

解决方案1
1 2022-04-02 18:46:38

Input Data used使用的输入数据

解决方案2
0 2022-04-02 18:48:48

从数据框中删除区间外的行 [R]

问题描述

2 个解决方案

解决方案1 1 2022-04-02 18:46:38

Input Data used使用的输入数据

解决方案2 0 2022-04-02 18:48:48

解决方案1
1 2022-04-02 18:46:38

解决方案2
0 2022-04-02 18:48:48