简体   繁体   中英

Count rows based on multiple criteria

I have a simple question, but I don't know how to solve this... I have two matrix and i'm trying to create a column in the first one that represents the number of times a row in the second one matches a set of criteria. For example, imagine I have Matrix A

    Ad1    Ad2    Ad3    Ad4
    AA     101      0     10
    AA     101     10     12
    AA     101     12     15
    AA     101     15     20
    AA     300      0    100
    AA     300    100    230
    AA     300    230    300
    ...

and matrix B is

    Bd1    Bd2    Bd3
    AA     101      0
    AA     101      1
    AA     101      2
    AA     101      4
    AA     101      5
    ...
    AB     102      1
    AB     102     10
    ...

and I would like two create a fifth column in A with the count of the number of rows in B that matches the following condition (for each row of A):

(A$Ad1==B$Bd1) & (A$Ad2==B$Bd2) & (A$Ad3<=B$Bd3) & (A$Ad4>B$Bd3)

Is there a way to perform this without creating a loop for each row of A?

The factor nature of the first column can get in the way so using either as.character or %in% is needed for the first comparison:

A = read.table(text="Ad1    Ad2    Ad3    Ad4
     AA     101      0     10
     AA     101     10     12
     AA     101     12     15
     AA     101     15     20
     AA     300      0    100
     AA     300    100    230
     AA     300    230    300", header=TRUE)

B = read.table(text="    Bd1    Bd2    Bd3
     AA     101      0
     AA     101      1
     AA     101      2
     AA     101      4
     AA     101      5
     AB     102      1
     AB     102     10", header=TRUE)
> with( A, mapply(function(x,y,z,z2){sum((x %in% B$Bd1) & (y == B$Bd2) & 
                                         (z <= B$Bd3) & (z2 > B$Bd3) )},
                                     Ad1, Ad2, Ad3, Ad4)  )
[1] 5 0 0 0 0 0 0

> with( A, mapply(function(x,y,z,z2){sum((as.character(x) == B$Bd1) & (y == B$Bd2) & 
                                          (z <= B$Bd3) & (z2 > B$Bd3) )},
                                     Ad1, Ad2, Ad3, Ad4)  )
[1] 5 0 0 0 0 0 0

This is the error that gets thrown with the use of ==

> factor("a", levels=c("a","b")) == factor("a")
Error in Ops.factor(factor("a", levels = c("a", "b")), factor("a")) : 
  level sets of factors are different

You could us apply :

A = read.table(text="
    Ad1    Ad2    Ad3    Ad4
    AA     101      0     10
    AA     101     10     12
    AA     101     12     15
    AA     101     15     20
    AA     300      0    100
           ", header=T)

B = read.table(text="
Bd1    Bd2    Bd3
AA     101      0
AA     101      1
AA     101      2
AA     101      10
AA     101      12
           ", header=T)

Use apply to count the number of rows in B your condition holds for each row in A.

apply(A, 1, function(x) {
  sum( (x["Ad1"] == B$Bd1)  &
       (as.numeric(x["Ad2"]) == B$Bd2) &
       (as.numeric(x["Ad3"]) <= B$Bd3) &
       (as.numeric(x["Ad4"]) > B$Bd3) )
})

[1] 3 1 1 0 0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM