[英]How to eliminate matching pattern based on condition
I have a dataframe with 3 columns as below 我有一个3列的数据框,如下所示
I/p dataframe I / p数据框
A = c("(0_22),(0_25),(1_29)","(1_34),(1_38),(0_40)","(0_07),(0_09),(0_10),(0_13)","(1_47),(1_49),(1_53),(1_57)")
zero =c(5,NA,6,NA)
one = c(NA,4,NA,10)
df = data.frame(A,zero,one)
O/p dataframe O / p数据帧
A = c("(0_22),(0_25),(1_29)","(1_34),(1_38),(0_40)","(0_07),(0_09),(0_10),(0_13)","(1_47),(1_49),(1_53),(1_57)")
zero =c(5,NA,6,NA)
one = c(NA,4,NA,10)
required_val = c("(1_29)","(0_40)",'','')
df = data.frame(A,zero,one,required_val)
How to get column "required_val" from variable "A" based on zero and One variables 如何基于零和一变量从变量“ A”获取列“ required_val”
ie if var "zero" is greater than 0 then eliminate the string which consists of (0_) 即如果var“ zero”大于0,则消除由(0_)组成的字符串
if var "one" is greater than 0 then eliminate the string which consists of (1_) 如果var“ one”大于0,则消除由(1_)组成的字符串
This is basically a pattern match question: 这基本上是一个模式匹配问题:
library(magrittr) # to avoid repeating the long subscript below
df$A <- as.character(df$A) # think this is what you wanted
# get rid of the (0_...) bits:
df$A[! is.na(df$zero) & df$zero > 0] %<>%
{gsub("?\\(0_.*?\\)", "", .)}
# and the (1_...) bits:
df$A[! is.na(df$one) & df$one > 0] %<>%
{gsub("?\\(1_.*?\\)", "", .)}
# now get rid of trailing commas (this was trickiest!)
df$A %<>%
{gsub(",+$", "", .)} %>%
{gsub("^,+", "", .)} %>%
{gsub(",+", ",", .)}
Here's an alternate solution, 这是一个替代解决方案,
required_val<-NA
for (i in 1:length(A))
{
required_val[i]<-""
if(!is.na(zero[i]) & grepl("1_",A[i]))
{
required_val[i]<-substr(A[i],unlist(gregexpr('1_',A[i]))[1],unlist(gregexpr('1_',A[i]))[1]+3)
} else if (!is.na(one[i]) & grepl('0_',A[i]))
{
required_val[i]<-substr(A[i],unlist(gregexpr('0_',A[i]))[1],unlist(gregexpr('0_',A[i]))[1]+3)
}
}
df = data.frame(A,zero,one,required_val)
Using apply 使用申请
#example data
df1 <- data.frame(A = c("(0_22),(0_25),(1_29)","(1_34),(1_38),(0_40)","(0_07),(0_09),(0_10),(0_13)","(1_47),(1_49),(1_53),(1_57)"),
zero = c(5, NA, 6, NA),
one = c(NA, 4, NA, 10),
stringsAsFactors = FALSE)
cbind(df1,
required_val = apply(df1, 1,
function(i){
ix <- which.max(as.numeric(i[2:3]) > 1) - 1
x <- unlist(strsplit(i[1], ","))
x <- x[ !grepl(paste0("^\\(", ix), x) ]
if(length(x) == 0) {x <- ""}
#return
x
}))
# A zero one required_val
# 1 (0_22),(0_25),(1_29) 5 NA (1_29)
# 2 (1_34),(1_38),(0_40) NA 4 (0_40)
# 3 (0_07),(0_09),(0_10),(0_13) 6 NA
# 4 (1_47),(1_49),(1_53),(1_57) NA 10
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.