[英]Remove multiple columns and replace values of columns of dataframe based on condition in R
[英]Matching values in multiple columns in R based on condition
說我有一個數據名流df
resident faculty submittedBy match caseID phase
george sally george 1 george_1 pre
george sally sally 0 george_1 pre
george sally george 1 george_1 intra
jane carl jane 1 jane_1 pre
jane carl carl 0 jane_1 pre
jane carl carl 0 jane_1 intra
並且我想根據以下參數向該數據幀添加一列df$response
(我認為我需要一組嵌套的ifelses,但是我正在努力正確地執行它):
對於給定的X行,如果df$match
= 1,
如果滿足以下條件,則在df$response
打印“ 1”:
df$match
= 0的df$match
中的任何行在 df$caseID
, df$faculty
和df$phase
與X行相同。否則輸出“ 0”。
所以輸出應該是這樣的:
response
1
0
0
1
0
0
因為只有第一行和第四行包含的值在df$match
= 1的行和df$match
= 0的行的df$caseID
, df$faculty
和df$phase
中都存在df$match
。
我們可以使用data.table
方法。 將'data.frame'轉換為'data.table'( setDT(df1)
),按'caseID','faculty','phase'分組,獲取match
檢查的unique
元素的長度(如果等於2且創建一個二進制列(“響應”),對於“匹配”為0的值,將“響應”分配給0
library(data.table)
setDT(df1)[, response := +((uniqueN(match) == 2) & match != 0),
.(caseID, faculty, phase)][]
# resident faculty submittedBy match caseID phase response
#1: george sally george 1 george_1 pre 1
#2: george sally sally 0 george_1 pre 0
#3: george sally george 1 george_1 intra 0
#4: jane carl jane 1 jane_1 pre 1
#5: jane carl carl 0 jane_1 pre 0
#6: jane carl carl 0 jane_1 intra 0
或使用base R
與ave
with(df1,+( match != 0 & ave(match, caseID, faculty, phase,
FUN = function(x) length(unique(x))) == 2))
#[1] 1 0 0 1 0 0
df1 <- structure(list(resident = structure(c(1L, 1L, 1L, 2L, 2L, 2L),
.Label = c("george",
"jane"), class = "factor"), faculty = structure(c(2L, 2L, 2L,
1L, 1L, 1L), .Label = c("carl", "sally"), class = "factor"),
submittedBy = structure(c(2L, 4L, 2L, 3L, 1L, 1L), .Label = c("carl",
"george", "jane", "sally"), class = "factor"), match = c(1L,
0L, 1L, 1L, 0L, 0L), caseID = structure(c(1L, 1L, 1L, 2L,
2L, 2L), .Label = c("george_1", "jane_1"), class = "factor"),
phase = structure(c(2L, 2L, 1L, 2L, 2L, 1L), .Label = c("intra",
"pre"), class = "factor")), class = "data.frame", row.names = c(NA,
-6L))
這是我會做的
# read the data
test <- read.table(text = 'resident faculty submittedBy match caseID phase
george sally george 1 george_1 pre
george sally sally 0 george_1 pre
george sally george 1 george_1 intra
jane carl jane 1 jane_1 pre
jane carl carl 0 jane_1 pre
jane carl carl 0 jane_1 intra', header=T)
# create the response
resp <- logical(0)
# iterate over each loop
for (rr in 1:nrow(test)){
if (test$match[rr] == 0){
resp[rr] <- 0
}
else{
tmp <- rbind(test[-rr, c('faculty', 'caseID', 'phase')], # add the onto the end
test[rr, c('faculty', 'caseID', 'phase')]) # test if line is duplicated
resp[rr] <- ifelse(duplicated(tmp)[nrow(tmp)], 1, 0)
}
}
使用[]
索引的速度更快,並且在您的計算機上的開銷也較小
df <- data.frame(
"resident" = c("george","george","george","jane","jane","jane"),
"faculty" = c("sally","sally","sally","carl","carl","carl"),
"submittedBy" = c("george","sally","george","jane","carl","carl"),
"match" = c(1,0,1,1,0,0),
"caseID" = c("george_1","george_1","george_1","jane_1","jane_1","jane_1"),
"phase" = c("pre","pre","intra","pre","pre","intra"),
stringsAsFactors = FALSE
)
response <- NULL
for (i in 1:nrow(df)) {
response[i] <- ifelse(
df$match[i] == 0, 0,
ifelse(
any(paste(df$caseID,df$faculty,df$phase,sep="")[df$match == 0] ==
paste(df$caseID,df$faculty,df$phase,sep="")[i]),
1, 0
)
)
}
response
[1] 1 0 0 1 0 0
另一種數據表方法。 加入關鍵變量,並檢查值是否不在match==0
集合中:
library(data.table)
setDT(dat)
dat[, response := match==1]
dat[!dat[match==0], on=c("caseID","faculty","phase"), response := FALSE]
dat
# resident faculty submittedBy match caseID phase response
#1: george sally george 1 george_1 pre TRUE
#2: george sally sally 0 george_1 pre FALSE
#3: george sally george 1 george_1 intra FALSE
#4: jane carl jane 1 jane_1 pre TRUE
#5: jane carl carl 0 jane_1 pre FALSE
#6: jane carl carl 0 jane_1 intra FALSE
假設match
只有1個值和0個值,使用dplyr
一種方法是檢查每個caseID
, faculty
和phase
是否在match
有兩個不同的值(1和0),並將response
替換為0,其中match
為0。
library(dplyr)
df %>%
group_by(caseID, faculty, phase) %>%
mutate(response = as.integer(n_distinct(match) == 2),
response = replace(response, match == 0, 0))
# resident faculty submittedBy match caseID phase response
# <chr> <chr> <chr> <dbl> <chr> <chr> <dbl>
#1 george sally george 1 george_1 pre 1
#2 george sally sally 0 george_1 pre 0
#3 george sally george 1 george_1 intra 0
#4 jane carl jane 1 jane_1 pre 1
#5 jane carl carl 0 jane_1 pre 0
#6 jane carl carl 0 jane_1 intra 0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.