简体   繁体   中英

R apply() function returning list of multiple elements

Here is my input data frame:

test <- data.frame(Col1=c("A","BCCC","DE"), Col2=c("Z","BC", "DEEEE"))
test
  Col1  Col2
1    A     Z
2 BCCC    BC
3   DE DEEEE

I am trying to create 2 more columns in my data frame, in a way that if the string in Col1 is contained in the string in Col2 (or the opposite), I trim all the common characters from these 2 strings and output the 2 trimmed strings in separate columns called Col1_short and Col2_short (or dots if no match):

  Col1  Col2 Col1_short Col2_short
1    A     Z          .          .
2 BCCC    BC        BCC          B
3   DE DEEEE          D       DEEE

I am trying this kind of approach where I create a list of lists that I could further unlist and append to the original data frame, but it does not work. Does someone could see a fix or a simpler way to do it?

My code:

out <- apply(
    test,
    1,
    function(x){
        ifelse(
            grepl(test$Col1, test$Col2) || grepl(test$Col2, test$Col1),
            {
                common <- ifelse(
                    nchar(test$Col1) < nchar(test$Col2),
                    nchar(test$Col1) - 1,
                    nchar(test$Col2) - 1
                )

                pattern = paste0(".{", common, "}$")

                return(
                    list(
                        Col1_short=gsub(pattern, "", x[1]),
                        Col2_short=gsub(pattern, "", x[2])
                    )
                )
            },
            return(
                list(
                    Col1_short=".",
                    Col2_short="."
                )
            )
        )
        }
    )

Output:

out
[[1]]
[[1]]$Col1_short
[1] "."

[[1]]$Col2_short
[1] "."


[[2]]
[[2]]$Col1_short
[1] "."

[[2]]$Col2_short
[1] "."


[[3]]
[[3]]$Col1_short
[1] "."

[[3]]$Col2_short
[1] "."

I thought to append 2 new columns to the data frame by doing:

test$Col1_short <- unlist(out)[attr(unlist(out), "names") == "Col1_short"]
test$Col2_short <- unlist(out)[attr(unlist(out), "names") == "Col2_short"]

You can try the code below

dfout <-
  cbind(test, `colnames<-`(sapply(test, function(x) {
    z <-
      as.character(x)
    substring(z, 1, nchar(z) - 1)
  }), paste0(names(test), "_short")))

such that

> dfout
  Col1  Col2 Col1_short Col2_short
1    A     Z                      
2 BCCC    BC        BCC          B
3   DE DEEEE          D       DEEE

We can use mapply and test for Col1 and corresponding Col2 value.

test[c("Col1_short", "Col2_short")] <- t(mapply(function(x, y) {
    if(grepl(x, y))
       c(substr(x, 1, nchar(x) - 1), substr(y, 1, nchar(y) - 1))
    else if(grepl(y, x))
        c(substr(x, 1, nchar(x) - 1), substr(y, 1, nchar(y) - 1))
    else
        c('.', '.')
}, test$Col1, test$Col2))

test
#  Col1  Col2 Col1_short Col2_short
#1    A     Z          .          .
#2 BCCC    BC        BCC          B
#3   DE DEEEE          D       DEEE

data

Using stringsAsFactors = FALSE to have character columns instead of factors.

test <- data.frame(Col1=c("A","BCCC","DE"), Col2=c("Z","BC", "DEEEE"),
                   stringsAsFactors = FALSE)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM