So as the title explain I would like to create a data frame. Take a look at the down which will be used as a matrice:
structure(c("2", "3", "8", "8", "10", "10", "11", "11", "11",
"11", "Frank", "Mark", "Greg", "Mati", "Paul",
"Cyntha", "Marcus", "Pablo", "Maggy", "Trist"
), .Dim = c(10L, 2L), .Dimnames = list(NULL, c("i", "vec_names"
)))
So, I would like to create columns based on the value in column i
. If the numbers are the same in column i
that means that two names which can be find in the next column should be stored in one column in new data frame.
Of course it means that length of columns would be different so the missing "strings" can be filled up with NAs.
Desired output:
2 3 8 10 11
Frank Mark Greg Paul Marcus
Mati Cyntha Pablo
Maggy
Trist
You can use reshape2's dcast
to reshape to wide:
DF = data.frame(m)
library(reshape2)
DF$s <- ave(DF$i, DF$i, FUN = seq_along)
res <- dcast(DF, s ~ i, value.var = "vec_names")
s 10 11 2 3 8
1 1 Paul Marcus Frank Mark Greg
2 2 Cyntha Pablo <NA> <NA> Mati
3 3 <NA> Maggy <NA> <NA> <NA>
4 4 <NA> Trist <NA> <NA> <NA>
Unfortunately, you have a column you don't need, s
, and the other columns are ordered lexicographically. If you want to fix that:
res$s <- NULL
res[order(as.integer(names(res)))]
2 3 8 10 11
1 Frank Mark Greg Paul Marcus
2 <NA> <NA> Mati Cyntha Pablo
3 <NA> <NA> <NA> <NA> Maggy
4 <NA> <NA> <NA> <NA> Trist
In base R, first converting your matrix ( mymat
) to a data.frame, you can try the following:
df <- as.data.frame(mymat, stringsAsFactors=FALSE) # convert your df to a data.frame
sp_df <- split(df, df$i) # split it according to "i"
nb_row <- sapply(sp_df, nrow) # compute the number of rows in each so you can complete with NAs
mapply(function(x, y) c(x$vec_names, rep(NA, max(nb_row)-y)),
x=sp_df,
y=nb_row) [, order(as.numeric(names(sp_df)))] # complete with NA when needed and keep only the second column. Finally, reorder the columns.
EDIT
Thanks to @Frank, here is a simpler way to go, splitting only the vector of names (after converting to a data.frame):
sp_nm = split(df$vec_names, df$i)
do.call(cbind, lapply(sp_nm, `length<-`, max(lengths(sp_nm))))[, order(as.numeric(names(sp_nm)))]
Both ways give the following output
# 2 3 8 10 11
#[1,] "Frank" "Mark" "Greg" "Paul" "Marcus"
#[2,] NA NA "Mati" "Cyntha" "Pablo"
#[3,] NA NA NA NA "Maggy"
#[4,] NA NA NA NA "Trist"
Try the spread function of the package tidyr. This will come close what you expect.
spread(data.frame(
structure(c("2", "3", "8", "8", "10", "10", "11", "11", "11",
"11", "Frank", "Mark", "Greg", "Mati", "Paul",
"Cyntha", "Marcus", "Pablo", "Maggy", "Trist"),
.Dim = c(10L, 2L), .Dimnames = list(NULL, c("i", "vec_names")))),
"i", "vec_names")
10 11 2 3 8
1 <NA> <NA> Frank <NA> <NA>
2 <NA> <NA> <NA> Mark <NA>
3 <NA> <NA> <NA> <NA> Greg
4 <NA> <NA> <NA> <NA> Mati
5 Paul <NA> <NA> <NA> <NA>
6 Cyntha <NA> <NA> <NA> <NA>
7 <NA> Marcus <NA> <NA> <NA>
8 <NA> Pablo <NA> <NA> <NA>
9 <NA> Maggy <NA> <NA> <NA>
10 <NA> Trist <NA> <NA> <NA>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.