I am relatively new to R and I am kind of hung up at trying to put my data into a suitable format. It seems like the reshape package might be useful for this, but I don't get any further than that.
I have a data frame in which one of the columns (V4) contains strings and numericals. I would like to split V4 by the grouping given in V2 and V1 and attach the results as three seperate columns to the data frame.
Edit: As my original example data frame did not quite capture the complexity of the problem, here is a more accurate example:
>df <- data.frame(V1=c(rep("SN", 8),rep("JK", 4)),
V2=c(1,1,2,2,2,3,3,3,1,1,2,2),
V3=c("Picture", "Response", "Sound", "Sound", "Response", "Sound", "Sound", "Response", "Sound", "Response", "Sound", "Sound"),
V4=c("Photo", "100", "XYZc02i03", "XYZq02i03", 200, "ZYXc01i30", "ZYXq01i30", 100, "XYZc02i40", 200, "XYZc02i03", "XYZq02i03" ),
stringsAsFactors=FALSE)
>V1 V2 V3 V4
SN 1 Picture Photo
SN 1 Response 100
SN 2 Sound XYZc02i03
SN 2 Sound XYZq02i03
SN 2 Response 200
SN 3 Sound ZYXc01i30
SN 3 Sound ZYXq01i30
SN 3 Response 100
JK 1 Sound XYZc02i40
JK 1 Response 200
JK 2 Sound XYZc02i03
JK 2 Sound XYZq02i03
And I want to get something like this:
V1 V2 V3 V4 V5 V6
SN 1 Picture Photo NA 100
SN 2 Sound XYZc02i03 XYZq02i03 200
SN 3 Sound ZYXc01i30 ZYXq01i30 100
JK 1 Sound XYZc02i40 NA 200
JK 2 Sound XYZc02i03 XYZq02i03 NA
EDIT: I don't always have the same number of observations in V2, which means there could be missing values for V4, V5, or V6 in the data frame I want to get.
Edit2: V6 should map onto the "response" Variable from V3, V4 and V5 ideally map on the "Sound" values from V3 in consecutive order.
I would be very appreciative of any advice on how to go about this. Or, if this problem has been solved elswhere and I missed it, a link would also be great.
You don't need a cbind
in your definition of df
. You'd use something like this:
df <- data.frame(V1=rep("SN", 6),
V2=rep(2:3, each=3),
V3=c("Sound", "Sound", "Response", "Sound", "Sound", "Response"),
V4=c("XYZc02i03", "XYZq02i03", 200, "ZYXc01i30", "ZYXq01i30", 100),
stringsAsFactors=FALSE)
But given a dataframe like the one you describe, you can get the desired results with:
max.subset.len <- 3 # or maybe max(sapply(split(df, list(df$V1, df$V2)), FUN=nrow))
fun <- function(v4) {length(v4) <- max.subset.len; v4}
agg <- aggregate(df$V4, by=list(df$V1, df$V2), FUN=fun)
results <- cbind(agg[1:2], agg[[3]])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.