简体   繁体   中英

Row-wise extract characters that differ between two strings

I have two columns of strings in a dataframe, and for each row I want to see the characters which differ.

Eg given

Lines <- "
a     b
cat   car
dog   ding
cow   haw"
df <- read.table(text = Lines, header = TRUE, as.is = TRUE)

return

a     b     diff
cat   car   t
dog   ding  o
cow   haw   co

I have seen

Extract characters that differ between two strings

as well as

Split comma-separated column into separate rows

where a number of neat solutions are returned, which would work for an individual row (first reference), or act row wise but not exactly what I want (second reference).

Ideally I'd like to use something like this:

Reduce(setdiff, strsplit(c(a, b), split = ""))

I tried:

apply(df, function(a,b) Reduce(setdiff, strsplit(c(a, b), split = "")))

but to no avail.

How can this be done?

ps I'm particularly keen to do this using dplyr if possible, but only for stylistic reasons

Assuming df shown reproducibly in the Note at the end define a function Diff which accepts two vecdors of strings, runs the setdiff on them and pastes the result together and then use mapply to run that on the two columns after splitting them into individual characters.

Diff <- function(x, y) paste(setdiff(x, y), collapse = "")
transform(df, diff = mapply(Diff, strsplit(a, ""), strsplit(b, "")))

giving:

    a    b diff
1 cat  car    t
2 dog ding    o
3 cow  haw   co

Note: The input df used above is:

Lines <- "
a     b
cat   car
dog   ding
cow   haw"
df <- read.table(text = Lines, header = TRUE, as.is = TRUE)

A solution from tidyverse and stringr .

library(tidyverse)
library(stringr)

dt2 <- dt %>%
  mutate(a_list = str_split(a, pattern = ""), b_list = str_split(b, pattern = "")) %>%
  mutate(diff = map2(a_list, b_list, setdiff)) %>%
  mutate(diff = map_chr(diff, ~paste(., collapse = ""))) %>%
  select_if(~!is.list(.))
dt2
# A tibble: 3 x 3
      a     b  diff
  <chr> <chr> <chr>
1   cat   car     t
2   dog  ding     o
3   cow   haw    co

DATA

dt <- read.table(text = "a     b
cat   car
                 dog   ding
                 cow   haw",
                 header = TRUE, stringsAsFactors = FALSE)

Using dplyr

library(dplyr)
ff = data.frame(a = c("dog","chair","love"),b = c("dot","liar","over"),stringsAsFactors = F)
st = ff %>% mutate(diff = sapply(Map(setdiff,strsplit(a,""),strsplit(b,"")),paste,collapse = ""))

> st
      a    b diff
1   dog  dot    g
2 chair liar   ch
3  love over    l

Here is another base R method using Map .

diffList <- Map(setdiff, strsplit(dat[[1]], ""), strsplit(dat[[2]], ""))
diffList
[[1]]
[1] "t"

[[2]]
[1] "o"

[[3]]
[1] "c" "o"

You can wrap this in sapply to return a character vector for your data.frame:

dat$charDiffs <-sapply(diffList, paste, collapse="")

which returns

dat
    a    b charDiffs
1 cat  car         t
2 dog ding         o
3 cow  haw        co

data (from dput )

dat <- 
structure(list(a = c("cat", "dog", "cow"), b = c("car", "ding", 
"haw")), .Names = c("a", "b"), row.names = c(NA, -3L), class = "data.frame")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM