按不同列过滤行

Question

In the data frame在数据框中

  x1 x2 x3 x4 x5
1  0  1  1  0  3
2  1  2  2  0  3
3  2  2  0  0  2
4  1  3  0  0  2
5  3  3  2  1  4
6  2  0  0  0  1

column x5 indicates where the first non-zero value in a row is. x5 列指示行中第一个非零值的位置。 The table should be read from right (x4) to left (x1).该表应从右 (x4) 到左 (x1) 阅读。 Thus, the first non-zero value in the first row is in column x3, for example.因此，例如，第一行中的第一个非零值在列 x3 中。

I want to get all rows where 1 is the first non zero entry, ie我想获取第一个非零条目为 1 的所有行，即

  x1 x2 x3 x4 x5
1  0  1  1  0  3
2  3  3  2  1  4

should be the result.应该是结果。 I tried different version of filter_at but I didn't manage to come up with a solution.我尝试了不同版本的 filter_at 但我没有想出一个解决方案。 Eg one try was例如，一次尝试是

testdf %>% filter_at(vars(
    paste("x",testdf$x5, sep = "")),
    any_vars(. == 1))

I want to solve that without a for loop, since the real data set has millions of rows and almost 100 columns.我想在没有 for 循环的情况下解决这个问题，因为真实的数据集有数百万行和近 100 列。

Answer 1

You can do filtering row-wise easily with the new utility function c_across :您可以使用新实用程序 function c_across轻松地进行逐行过滤：

library(dplyr) # version 1.0.2

testdf %>% rowwise() %>% filter(c_across(x1:x4)[x5] == 1) %>% ungroup()
# A tibble: 2 x 5
     x1    x2    x3    x4    x5
  <int> <int> <int> <int> <int>
1     0     1     1     0     3
2     3     3     2     1     4

Answer 2

A vectorised base R solution would be:矢量化基础 R 解决方案将是：

result <- df[df[cbind(1:nrow(df), df$x5)] == 1, ]
result

#  x1 x2 x3 x4 x5
#1  0  1  1  0  3
#5  3  3  2  1  4

cbind(1:nrow(df), df$x5) creates a row-column matrix of largest value in each row. cbind(1:nrow(df), df$x5)在每一行中创建一个最大值的行列矩阵。 We extract those first values and select rows with 1 in them.我们提取这些第一个值和 select 行，其中包含 1。

Answer 3

Another vectorised solution:另一个矢量化解决方案：

df[t(df)[t(col(df)==df$x5)]==1,]

Answer 4

We can use apply in base R我们可以在base R中使用apply

df1[apply(df1, 1, function(x) x[x[5]] == 1),]
#  x1 x2 x3 x4 x5
#1  0  1  1  0  3
#5  3  3  2  1  4

data数据

df1 <- structure(list(x1 = c(0L, 1L, 2L, 1L, 3L, 2L), x2 = c(1L, 2L, 
2L, 3L, 3L, 0L), x3 = c(1L, 2L, 0L, 0L, 2L, 0L), x4 = c(0L, 0L, 
0L, 0L, 1L, 0L), x5 = c(3L, 3L, 2L, 2L, 4L, 1L)), 
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))

按不同列过滤行

问题描述

4 个解决方案

解决方案1
2 2020-11-27 12:11:43

解决方案2
0 已采纳 2020-11-27 13:13:01

解决方案3
0 2020-11-27 14:41:05

解决方案4
0 2020-11-27 19:52:14

data数据

按不同列过滤行

问题描述

4 个解决方案

解决方案1 2 2020-11-27 12:11:43

解决方案2 0 已采纳 2020-11-27 13:13:01

解决方案3 0 2020-11-27 14:41:05

解决方案4 0 2020-11-27 19:52:14

data数据

解决方案1
2 2020-11-27 12:11:43

解决方案2
0 已采纳 2020-11-27 13:13:01

解决方案3
0 2020-11-27 14:41:05

解决方案4
0 2020-11-27 19:52:14