根據條件匹配R中多列中的值

Question

說我有一個數據名流df

resident    faculty    submittedBy    match    caseID    phase

george      sally      george         1        george_1  pre
george      sally      sally          0        george_1  pre
george      sally      george         1        george_1  intra
jane        carl       jane           1        jane_1    pre
jane        carl       carl           0        jane_1    pre
jane        carl       carl           0        jane_1    intra

並且我想根據以下參數向該數據幀添加一列df$response （我認為我需要一組嵌套的ifelses，但是我正在努力正確地執行它）：

對於給定的X行，如果df$match = 1，

如果滿足以下條件，則在df$response打印“ 1”：

df$match = 0的df$match中的任何行在 df$caseID ， df$faculty和df$phase與X行相同。否則輸出“ 0”。

所以輸出應該是這樣的：

response

1
0
0
1
0
0

因為只有第一行和第四行包含的值在df$match = 1的行和df$match = 0的行的df$caseID ， df$faculty和df$phase中都存在df$match 。

Answer 1

我們可以使用data.table方法。 將'data.frame'轉換為'data.table'（ setDT(df1) ），按'caseID'，'faculty'，'phase'分組，獲取match檢查的unique元素的長度（如果等於2且創建一個二進制列（“響應”），對於“匹配”為0的值，將“響應”分配給0

library(data.table)
setDT(df1)[, response := +((uniqueN(match) == 2) & match != 0), 
                  .(caseID, faculty, phase)][]
#   resident faculty submittedBy match   caseID phase response
#1:   george   sally      george     1 george_1   pre        1
#2:   george   sally       sally     0 george_1   pre        0
#3:   george   sally      george     1 george_1 intra        0
#4:     jane    carl        jane     1   jane_1   pre        1
#5:     jane    carl        carl     0   jane_1   pre        0
#6:     jane    carl        carl     0   jane_1 intra        0

或使用base R與ave

with(df1,+( match != 0 & ave(match, caseID, faculty, phase, 
         FUN = function(x) length(unique(x))) == 2))
#[1] 1 0 0 1 0 0

數據

df1 <- structure(list(resident = structure(c(1L, 1L, 1L, 2L, 2L, 2L), 
.Label = c("george", 
"jane"), class = "factor"), faculty = structure(c(2L, 2L, 2L, 
1L, 1L, 1L), .Label = c("carl", "sally"), class = "factor"), 
    submittedBy = structure(c(2L, 4L, 2L, 3L, 1L, 1L), .Label = c("carl", 
    "george", "jane", "sally"), class = "factor"), match = c(1L, 
    0L, 1L, 1L, 0L, 0L), caseID = structure(c(1L, 1L, 1L, 2L, 
    2L, 2L), .Label = c("george_1", "jane_1"), class = "factor"), 
    phase = structure(c(2L, 2L, 1L, 2L, 2L, 1L), .Label = c("intra", 
    "pre"), class = "factor")), class = "data.frame", row.names = c(NA, 
-6L))

Answer 2

這是我會做的

# read the data
test <- read.table(text = 'resident    faculty    submittedBy    match    caseID    phase
                   george      sally      george         1        george_1  pre
                   george      sally      sally          0        george_1  pre
                   george      sally      george         1        george_1  intra
                   jane        carl       jane           1        jane_1    pre
                   jane        carl       carl           0        jane_1    pre
                   jane        carl       carl           0        jane_1    intra', header=T)

# create the response
resp <- logical(0)

# iterate over each loop
for (rr in 1:nrow(test)){
  if (test$match[rr] == 0){
    resp[rr] <- 0
  }
  else{
    tmp <- rbind(test[-rr, c('faculty', 'caseID', 'phase')],  # add the onto the end
                 test[rr, c('faculty', 'caseID', 'phase')])   # test if line is duplicated
    resp[rr] <- ifelse(duplicated(tmp)[nrow(tmp)], 1, 0)
  }
}

Answer 3

使用[]索引的速度更快，並且在您的計算機上的開銷也較小

df <- data.frame(
  "resident" = c("george","george","george","jane","jane","jane"),
  "faculty" = c("sally","sally","sally","carl","carl","carl"),
  "submittedBy" = c("george","sally","george","jane","carl","carl"),
  "match" = c(1,0,1,1,0,0),
  "caseID" = c("george_1","george_1","george_1","jane_1","jane_1","jane_1"),
  "phase" = c("pre","pre","intra","pre","pre","intra"),
  stringsAsFactors = FALSE
  )

response <- NULL

for (i in 1:nrow(df)) {
  response[i] <- ifelse(
    df$match[i] == 0, 0,
    ifelse(
      any(paste(df$caseID,df$faculty,df$phase,sep="")[df$match == 0] == 
            paste(df$caseID,df$faculty,df$phase,sep="")[i]),
      1, 0
    )
  )
}

response
[1] 1 0 0 1 0 0

Answer 4

另一種數據表方法。 加入關鍵變量，並檢查值是否不在match==0集合中：

library(data.table)
setDT(dat)

dat[, response := match==1]
dat[!dat[match==0], on=c("caseID","faculty","phase"), response := FALSE]

dat
#   resident faculty submittedBy match   caseID phase response
#1:   george   sally      george     1 george_1   pre     TRUE
#2:   george   sally       sally     0 george_1   pre    FALSE
#3:   george   sally      george     1 george_1 intra    FALSE
#4:     jane    carl        jane     1   jane_1   pre     TRUE
#5:     jane    carl        carl     0   jane_1   pre    FALSE
#6:     jane    carl        carl     0   jane_1 intra    FALSE

Answer 5

假設match只有1個值和0個值，使用dplyr一種方法是檢查每個caseID ， faculty和phase是否在match有兩個不同的值（1和0），並將response替換為0，其中match為0。

library(dplyr)
df %>%
  group_by(caseID, faculty, phase) %>%
  mutate(response = as.integer(n_distinct(match) == 2),
         response = replace(response, match == 0, 0))

#  resident faculty submittedBy match caseID   phase response
#  <chr>    <chr>   <chr>       <dbl> <chr>    <chr>    <dbl>
#1 george   sally   george          1 george_1 pre          1
#2 george   sally   sally           0 george_1 pre          0
#3 george   sally   george          1 george_1 intra        0
#4 jane     carl    jane            1 jane_1   pre          1
#5 jane     carl    carl            0 jane_1   pre          0
#6 jane     carl    carl            0 jane_1   intra        0

根據條件匹配R中多列中的值

問題描述

5 個解決方案

解決方案1
5 2019-07-09 01:42:34

數據

解決方案2
3 已采納 2019-07-09 01:17:23

解決方案3
2 2019-07-09 01:15:24

解決方案4
2 2019-07-09 02:05:04

解決方案5
1 2019-07-09 01:35:53

根據條件匹配R中多列中的值

問題描述

5 個解決方案

解決方案1 5 2019-07-09 01:42:34

數據

解決方案2 3 已采納 2019-07-09 01:17:23

解決方案3 2 2019-07-09 01:15:24

解決方案4 2 2019-07-09 02:05:04

解決方案5 1 2019-07-09 01:35:53

解決方案1
5 2019-07-09 01:42:34

解決方案2
3 已采納 2019-07-09 01:17:23

解決方案3
2 2019-07-09 01:15:24

解決方案4
2 2019-07-09 02:05:04

解決方案5
1 2019-07-09 01:35:53