简体   繁体   中英

how to extract each row of a dataframe and add the parsed strings from another dataframe to a column of the first data frame in R

dim <- data.frame(Max = c(1,2,3), Fax = c(4,5,6))
> dim
   Max Fax
 1   1   4
 2   2   5
 3   3   6

min <- data.frame(Num=c(1,2,3), Words = c("ab bc de","ma pa","ka da sa ba"))
> min
    Num       Words
 1   1       ab bc de
 2   2        ma pa
 3   3      ka da sa ba

I have two data frames dim amd min. The number of rows in both the data frame are same. Now I want to add another column(Words) to the dim data frame and the dim data frame would look like:

> dim

      Max   Fax   Words
 1     1     4     ab
 2     1     4     bc
 3     1     4     de
 4     2     5     ma
 5     2     5     pa
 6     3     6     ka
 7     3     6     da
 8     3     6     sa
 9     3     6     ba

Do you mean Fax = 6 for the last 4 columns? If so, this is maybe not the most elegant solution, but it should do the job:

tmp2 <- apply(min, 1, function(x) {
  tmp <- unlist(strsplit(as.character(x[2]), " "))
  data.frame(Num = rep(x[1], length(tmp)), Words = tmp)})

min <- do.call(rbind, tmp2)

dim <- merge(dim, min, by.x = "Max", by.y = "Num", all = TRUE)

dim

  Max Fax Words
1   1   4    ab
2   1   4    bc
3   1   4    de
4   2   5    ma
5   2   5    pa
6   3   6    ka
7   3   6    da
8   3   6    sa
9   3   6    ba

What I did I applied strsplit on Words and remade your min data.frame. After that merge function works well to put the data frames together.

I would think that it would be more efficient to merge first and then split. Here are two options to consider:

data.table

library(data.table)
DT <- data.table(merge(dim, min, by.x = "Max", by.y = "Num"), key = "Max,Fax")
DT[, list(unlist(strsplit(as.character(Words), " "))), by = key(DT)]
#    Max Fax V1
# 1:   1   4 ab
# 2:   1   4 bc
# 3:   1   4 de
# 4:   2   5 ma
# 5:   2   5 pa
# 6:   3   6 ka
# 7:   3   6 da
# 8:   3   6 sa
# 9:   3   6 ba

splitstackshape

concat.split.multiple from my "splitstackshape" package handles this kind of thing easily (though it is not always the quickest solution).

library(splitstackshape)
concat.split.multiple(merge(dim, min, by.x = "Max", by.y = "Num"), 
                      "Words", " ", "long")
#    Max Fax time Words
# 1    1   4    1    ab
# 2    2   5    1    ma
# 3    3   6    1    ka
# 4    1   4    2    bc
# 5    2   5    2    pa
# 6    3   6    2    da
# 7    1   4    3    de
# 8    2   5    3  <NA>
# 9    3   6    3    sa
# 10   1   4    4  <NA>
# 11   2   5    4  <NA>
# 12   3   6    4    ba

You can use complete.cases if you want to get rid of the NA values in the output from concat.split.multiple .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM