I have two columns of strings in a dataframe, and for each row I want to see the characters which differ.
Eg given
Lines <- "
a b
cat car
dog ding
cow haw"
df <- read.table(text = Lines, header = TRUE, as.is = TRUE)
return
a b diff
cat car t
dog ding o
cow haw co
I have seen
as well as
where a number of neat solutions are returned, which would work for an individual row (first reference), or act row wise but not exactly what I want (second reference).
Ideally I'd like to use something like this:
Reduce(setdiff, strsplit(c(a, b), split = ""))
I tried:
apply(df, function(a,b) Reduce(setdiff, strsplit(c(a, b), split = "")))
but to no avail.
How can this be done?
ps I'm particularly keen to do this using dplyr if possible, but only for stylistic reasons
Assuming df
shown reproducibly in the Note at the end define a function Diff
which accepts two vecdors of strings, runs the setdiff on them and pastes the result together and then use mapply
to run that on the two columns after splitting them into individual characters.
Diff <- function(x, y) paste(setdiff(x, y), collapse = "")
transform(df, diff = mapply(Diff, strsplit(a, ""), strsplit(b, "")))
giving:
a b diff
1 cat car t
2 dog ding o
3 cow haw co
Note: The input df
used above is:
Lines <- "
a b
cat car
dog ding
cow haw"
df <- read.table(text = Lines, header = TRUE, as.is = TRUE)
A solution from tidyverse
and stringr
.
library(tidyverse)
library(stringr)
dt2 <- dt %>%
mutate(a_list = str_split(a, pattern = ""), b_list = str_split(b, pattern = "")) %>%
mutate(diff = map2(a_list, b_list, setdiff)) %>%
mutate(diff = map_chr(diff, ~paste(., collapse = ""))) %>%
select_if(~!is.list(.))
dt2
# A tibble: 3 x 3
a b diff
<chr> <chr> <chr>
1 cat car t
2 dog ding o
3 cow haw co
DATA
dt <- read.table(text = "a b
cat car
dog ding
cow haw",
header = TRUE, stringsAsFactors = FALSE)
Using dplyr
library(dplyr)
ff = data.frame(a = c("dog","chair","love"),b = c("dot","liar","over"),stringsAsFactors = F)
st = ff %>% mutate(diff = sapply(Map(setdiff,strsplit(a,""),strsplit(b,"")),paste,collapse = ""))
> st
a b diff
1 dog dot g
2 chair liar ch
3 love over l
Here is another base R method using Map
.
diffList <- Map(setdiff, strsplit(dat[[1]], ""), strsplit(dat[[2]], ""))
diffList
[[1]]
[1] "t"
[[2]]
[1] "o"
[[3]]
[1] "c" "o"
You can wrap this in sapply
to return a character vector for your data.frame:
dat$charDiffs <-sapply(diffList, paste, collapse="")
which returns
dat
a b charDiffs
1 cat car t
2 dog ding o
3 cow haw co
data (from dput
)
dat <-
structure(list(a = c("cat", "dog", "cow"), b = c("car", "ding",
"haw")), .Names = c("a", "b"), row.names = c(NA, -3L), class = "data.frame")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.