Here is my input data frame:
test <- data.frame(Col1=c("A","BCCC","DE"), Col2=c("Z","BC", "DEEEE"))
test
Col1 Col2
1 A Z
2 BCCC BC
3 DE DEEEE
I am trying to create 2 more columns in my data frame, in a way that if the string in Col1
is contained in the string in Col2
(or the opposite), I trim all the common characters from these 2 strings and output the 2 trimmed strings in separate columns called Col1_short
and Col2_short
(or dots if no match):
Col1 Col2 Col1_short Col2_short
1 A Z . .
2 BCCC BC BCC B
3 DE DEEEE D DEEE
I am trying this kind of approach where I create a list of lists that I could further unlist and append to the original data frame, but it does not work. Does someone could see a fix or a simpler way to do it?
My code:
out <- apply(
test,
1,
function(x){
ifelse(
grepl(test$Col1, test$Col2) || grepl(test$Col2, test$Col1),
{
common <- ifelse(
nchar(test$Col1) < nchar(test$Col2),
nchar(test$Col1) - 1,
nchar(test$Col2) - 1
)
pattern = paste0(".{", common, "}$")
return(
list(
Col1_short=gsub(pattern, "", x[1]),
Col2_short=gsub(pattern, "", x[2])
)
)
},
return(
list(
Col1_short=".",
Col2_short="."
)
)
)
}
)
Output:
out
[[1]]
[[1]]$Col1_short
[1] "."
[[1]]$Col2_short
[1] "."
[[2]]
[[2]]$Col1_short
[1] "."
[[2]]$Col2_short
[1] "."
[[3]]
[[3]]$Col1_short
[1] "."
[[3]]$Col2_short
[1] "."
I thought to append 2 new columns to the data frame by doing:
test$Col1_short <- unlist(out)[attr(unlist(out), "names") == "Col1_short"]
test$Col2_short <- unlist(out)[attr(unlist(out), "names") == "Col2_short"]
You can try the code below
dfout <-
cbind(test, `colnames<-`(sapply(test, function(x) {
z <-
as.character(x)
substring(z, 1, nchar(z) - 1)
}), paste0(names(test), "_short")))
such that
> dfout
Col1 Col2 Col1_short Col2_short
1 A Z
2 BCCC BC BCC B
3 DE DEEEE D DEEE
We can use mapply
and test for Col1
and corresponding Col2
value.
test[c("Col1_short", "Col2_short")] <- t(mapply(function(x, y) {
if(grepl(x, y))
c(substr(x, 1, nchar(x) - 1), substr(y, 1, nchar(y) - 1))
else if(grepl(y, x))
c(substr(x, 1, nchar(x) - 1), substr(y, 1, nchar(y) - 1))
else
c('.', '.')
}, test$Col1, test$Col2))
test
# Col1 Col2 Col1_short Col2_short
#1 A Z . .
#2 BCCC BC BCC B
#3 DE DEEEE D DEEE
data
Using stringsAsFactors = FALSE
to have character columns instead of factors.
test <- data.frame(Col1=c("A","BCCC","DE"), Col2=c("Z","BC", "DEEEE"),
stringsAsFactors = FALSE)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.