简体   繁体   English

按不同列过滤行

[英]Filtering rows by different columns

In the data frame在数据框中

  x1 x2 x3 x4 x5
1  0  1  1  0  3
2  1  2  2  0  3
3  2  2  0  0  2
4  1  3  0  0  2
5  3  3  2  1  4
6  2  0  0  0  1

column x5 indicates where the first non-zero value in a row is. x5 列指示行中第一个非零值的位置。 The table should be read from right (x4) to left (x1).该表应从右 (x4) 到左 (x1) 阅读。 Thus, the first non-zero value in the first row is in column x3, for example.因此,例如,第一行中的第一个非零值在列 x3 中。

I want to get all rows where 1 is the first non zero entry, ie我想获取第一个非零条目为 1 的所有行,即

  x1 x2 x3 x4 x5
1  0  1  1  0  3
2  3  3  2  1  4

should be the result.应该是结果。 I tried different version of filter_at but I didn't manage to come up with a solution.我尝试了不同版本的 filter_at 但我没有想出一个解决方案。 Eg one try was例如,一次尝试是

testdf %>% filter_at(vars(
    paste("x",testdf$x5, sep = "")),
    any_vars(. == 1))

I want to solve that without a for loop, since the real data set has millions of rows and almost 100 columns.我想在没有 for 循环的情况下解决这个问题,因为真实的数据集有数百万行和近 100 列。

You can do filtering row-wise easily with the new utility function c_across :您可以使用新实用程序 function c_across轻松地进行逐行过滤:

library(dplyr) # version 1.0.2

testdf %>% rowwise() %>% filter(c_across(x1:x4)[x5] == 1) %>% ungroup()
# A tibble: 2 x 5
     x1    x2    x3    x4    x5
  <int> <int> <int> <int> <int>
1     0     1     1     0     3
2     3     3     2     1     4

A vectorised base R solution would be:矢量化基础 R 解决方案将是:

result <- df[df[cbind(1:nrow(df), df$x5)] == 1, ]
result

#  x1 x2 x3 x4 x5
#1  0  1  1  0  3
#5  3  3  2  1  4

cbind(1:nrow(df), df$x5) creates a row-column matrix of largest value in each row. cbind(1:nrow(df), df$x5)在每一行中创建一个最大值的行列矩阵。 We extract those first values and select rows with 1 in them.我们提取这些第一个值和 select 行,其中包含 1。

Another vectorised solution:另一个矢量化解决方案:

df[t(df)[t(col(df)==df$x5)]==1,]

We can use apply in base R我们可以在base R中使用apply

df1[apply(df1, 1, function(x) x[x[5]] == 1),]
#  x1 x2 x3 x4 x5
#1  0  1  1  0  3
#5  3  3  2  1  4

data数据

df1 <- structure(list(x1 = c(0L, 1L, 2L, 1L, 3L, 2L), x2 = c(1L, 2L, 
2L, 3L, 3L, 0L), x3 = c(1L, 2L, 0L, 0L, 2L, 0L), x4 = c(0L, 0L, 
0L, 0L, 1L, 0L), x5 = c(3L, 3L, 2L, 2L, 4L, 1L)), 
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM