简体   繁体   中英

Create columns of data frame based on rows from another data frame

So as the title explain I would like to create a data frame. Take a look at the down which will be used as a matrice:

structure(c("2", "3", "8", "8", "10", "10", "11", "11", "11", 
            "11", "Frank", "Mark", "Greg", "Mati", "Paul", 
            "Cyntha", "Marcus", "Pablo", "Maggy", "Trist"
), .Dim = c(10L, 2L), .Dimnames = list(NULL, c("i", "vec_names"
)))

So, I would like to create columns based on the value in column i . If the numbers are the same in column i that means that two names which can be find in the next column should be stored in one column in new data frame.

Of course it means that length of columns would be different so the missing "strings" can be filled up with NAs.

Desired output:

2     3    8    10     11
Frank Mark Greg Paul   Marcus
           Mati Cyntha Pablo 
                       Maggy
                       Trist

You can use reshape2's dcast to reshape to wide:

DF = data.frame(m)

library(reshape2)
DF$s <- ave(DF$i, DF$i, FUN = seq_along)
res  <- dcast(DF, s ~ i, value.var = "vec_names")

  s     10     11     2    3    8
1 1   Paul Marcus Frank Mark Greg
2 2 Cyntha  Pablo  <NA> <NA> Mati
3 3   <NA>  Maggy  <NA> <NA> <NA>
4 4   <NA>  Trist  <NA> <NA> <NA>

Unfortunately, you have a column you don't need, s , and the other columns are ordered lexicographically. If you want to fix that:

res$s <- NULL
res[order(as.integer(names(res)))]

      2    3    8     10     11
1 Frank Mark Greg   Paul Marcus
2  <NA> <NA> Mati Cyntha  Pablo
3  <NA> <NA> <NA>   <NA>  Maggy
4  <NA> <NA> <NA>   <NA>  Trist

In base R, first converting your matrix ( mymat ) to a data.frame, you can try the following:

df <- as.data.frame(mymat, stringsAsFactors=FALSE) # convert your df to a data.frame
sp_df <- split(df, df$i) # split it according to "i"
nb_row <- sapply(sp_df, nrow) # compute the number of rows in each so you can complete with NAs
mapply(function(x, y) c(x$vec_names, rep(NA, max(nb_row)-y)), 
       x=sp_df, 
       y=nb_row) [, order(as.numeric(names(sp_df)))] # complete with NA when needed and keep only the second column. Finally, reorder the columns.

EDIT

Thanks to @Frank, here is a simpler way to go, splitting only the vector of names (after converting to a data.frame):

sp_nm = split(df$vec_names, df$i)
do.call(cbind, lapply(sp_nm, `length<-`, max(lengths(sp_nm))))[, order(as.numeric(names(sp_nm)))]

Both ways give the following output

#    2       3      8      10       11      
#[1,] "Frank" "Mark" "Greg" "Paul"   "Marcus"
#[2,] NA      NA     "Mati" "Cyntha" "Pablo" 
#[3,] NA      NA     NA     NA       "Maggy" 
#[4,] NA      NA     NA     NA       "Trist"

Try the spread function of the package tidyr. This will come close what you expect.

spread(data.frame(
  structure(c("2", "3", "8", "8", "10", "10", "11", "11", "11", 
                              "11", "Frank", "Mark", "Greg", "Mati", "Paul", 
                              "Cyntha", "Marcus", "Pablo", "Maggy", "Trist"), 
                            .Dim = c(10L, 2L), .Dimnames = list(NULL, c("i", "vec_names")))), 
  "i", "vec_names")

               10     11     2    3    8
        1    <NA>   <NA> Frank <NA> <NA>
        2    <NA>   <NA>  <NA> Mark <NA>
        3    <NA>   <NA>  <NA> <NA> Greg
        4    <NA>   <NA>  <NA> <NA> Mati
        5    Paul   <NA>  <NA> <NA> <NA>
        6  Cyntha   <NA>  <NA> <NA> <NA>
        7    <NA> Marcus  <NA> <NA> <NA>
        8    <NA>  Pablo  <NA> <NA> <NA>
        9    <NA>  Maggy  <NA> <NA> <NA>
        10   <NA>  Trist  <NA> <NA> <NA>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM