How to subset a data.frame by a row in the same data.frame in R?

Question

I would like to subset my data.frame so that it only returns the rows that have at least 50% of their values <= the values in a specific row in the data.frame.

df

Name   A   B   C   D
r1     2   2   2   2
r2     4   3   1   3
r3     1   1   1   2
r4     3   3   3   1

The specific row I am trying to subset by is row r1 . I only want to return row r3 since since 75% of the values are <= to the values in row r1 .

df

Name   A   B   C   D
r3     1   1   1   2

Any help will be appreciated. Please let me know if further information is needed.

Answer 1

Add up the number of conditions met on a row by row basis using "+" and compare to 3:

subset(df, ( (A <= A[1]) + (B <= B[1]) + (C <= C[1]) + (D <= D[1]) ) >= 3 )

> subset(df, ( (A <= A[1]) + (B <= B[1]) + (C <= C[1]) + (D <= D[1]) ) >= 3 )
  Name A B C D
1   r1 2 2 2 2
3   r3 1 1 1 2

If you want to also remove 'r1' then just append [-1, ]

This can be generalized to provide a numeric vector that can be tested against a percentage criterion; it give the number of items in each row that are less than their counterparts in the first row. I needed to unlist the first row because using the third argument as a single row dataframe failed:

rowSums(sweep(df[-1], 2, unlist(df[1,-1]), "<="))
[1] 4 2 4 2

Below is a demonstration:

df2 <- cbind(nms = paste0("r", 1:10), 
             as.data.frame( matrix(sample( 1:10, 200,repl=TRUE), 10) ) )
df2
#--------------
nms V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
1   r1  8  6 10  7  3  7  2  8  4   9   9   4   5   4   8   7   2   1   6   4
2   r2  3  9  6  3  9 10  6 10 10   3   3   2   4   4   4  10   3   5   2   1
3   r3  1  7  6  8  3  5  2  3  1   5   5   4   8   3   1   6   2  10   3   7
4   r4  2  6 10 10  8  7  9  1  4   5   6   7   2   6   8   3   5  10  10   3
5   r5  5  5  7  2  5 10  2  9  2   9   4   6   1   5   8   5   8   6   3   5
6   r6  4  1  7  7  6  9  6  3  4   3   2   9   4   8  10   3   4   4  10   4
7   r7  7  1 10  4  1  2  8  5  8   8   5   5   5   6   4  10   6   9  10   6
8   r8 10  8  1  4  1  4 10  3  1   3  10   3   4   9   4   7   4   9   2   2
9   r9  3 10  9  1 10  8  8  4  7   2   7   2   9  10   3   3   7   4  10   1
10 r10  4  7  3  3  1  9  4  1  9   5   3   9   9   3   9   2   9  10   2   4
#-----------------
rowSums(sweep(df2[-1], 2, unlist(df2[1,-1]), "<="))
# [1] 20 11 15 12 12 11 11 13 10 11
rowSums(sweep(df2[-1], 2, unlist(df2[1,-1]), "<=")) >= 20*0.75
# [1]  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

 df2[ rowSums(sweep(df2[-1], 2, unlist(df2[1,-1]), "<=")) >= 20*0.75 , ]
#---------
  nms V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
1  r1  8  6 10  7  3  7  2  8  4   9   9   4   5   4   8   7   2   1   6   4
3  r3  1  7  6  8  3  5  2  3  1   5   5   4   8   3   1   6   2  10   3   7

It's occurred to me that an apply solution would probably have seemed more obvious to some R programmers:

 colSums( apply(df2[-1], 1, "<=", df2[1,-1]) ) >= ncol(df2)*.7

Note the need to use colSums because of the way that `apply returns a matrix in column-oriented fashion, sometimes a puzzle to beginneRs.

Answer 2

Here is the generic solution that can also be applied for 34 variables:

Assumption: In the dataset, we are comparing every column except the first column which stores Name .

> col_names <- colnames(df)[-1]

> index <- which(df$Name == 'r1')
> values <- seq(1:nrow(df))[-index]

> row_num <- integer(0)
> for (i in values){
+ min_val <- length(col_names) / 2
+ if (length(which(df[i,col_names] <= df[index,col_names])) >= min_val)
+ row_num <- c(row_num,i)
+ }

> df[row_num,]
  Name A B C D
3   r3 1 1 1 2

Though if the dataset is large, it might take some time. You can improve the performance with the help of data.table package.

How to subset a data.frame by a row in the same data.frame in R?

Question

2 answers

solution1
2 2016-04-12 02:18:44

solution2
0 ACCPTED 2016-04-12 02:34:13

How to subset a data.frame by a row in the same data.frame in R?

Question

2 answers

solution1 2 2016-04-12 02:18:44

solution2 0 ACCPTED 2016-04-12 02:34:13

solution1
2 2016-04-12 02:18:44

solution2
0 ACCPTED 2016-04-12 02:34:13