I've tried using many variations of the reshape() function (reshape2 package) to turn a data frame of different factors into a two column matrix, but I've been unsuccessful. I also need to ignore blanks. Here is a simplified example of what I'm trying to accomplish:
Code Bucket1 Bucket2
1 Green Blue
2 Green (Blank)
3 (Blank) (Blank)
4 (Blank) Blue
INTO:
Code Bucket
1 Green
1 Blue
2 Green
4 Blue
Can anybody help with reshape()?
The reshape2 package contains a melt
function this is for converting datasets from wide to long format. There is a reshape
function, which is part of the **stats* package, which is also useful for reshaping data but is not a function I know well.
To reshape data that you've described, you can use melt
as follows. I'm guessing your blanks are NA
, so I use the na.rm
argument to remove them. I use the value.name
argument to name the new column that is created.
melt(dat, id.vars = "Code", na.rm = TRUE, value.name = "Bucket")
Result:
Code variable Bucket
1 1 Bucket1 Green
2 2 Bucket1 Green
5 1 Bucket2 Blue
8 4 Bucket2 Blue
This doesn't give the exact output you requested, as you want your final dataset in a specific order and without the new variable
column. You can use some of the handy functions from the dplyr package to remove the extra column (using select
) and order by Code
(using arrange
), although there are certainly other ways to manipulate the result after melt
ing.
require(dplyr)
dat %>%
melt(id.vars = "Code", na.rm = TRUE, value.name = "Bucket") %>%
select(-variable) %>%
arrange(Code)
Now the result looks like:
Code Bucket 1 1 Green 2 1 Blue 3 2 Green 4 4 Blue
library(data.table)
dat <- as.data.table(your_original_data.frame)
dat[, c(Bucket1, Bucket2), by=Code]
Code V1
1: 1 Green
2: 1 Blue
3: 2 Green
4: 2 NA
5: 3 NA
6: 3 NA
7: 4 NA
8: 4 Blue
## To drop the NA's
dat[, c(Bucket1, Bucket2), by=Code][!is.na(V1)]
Code V1
1: 1 Green
2: 1 Blue
3: 2 Green
4: 4 Blue
## if they are actually called "(Blank)" use
dat[, c(Bucket1, Bucket2), by=Code][V1 != "(Blank)"]
update: To convert your factors to characters:
colsToConvert <- setdiff(names(dat), "Code") # or manually type them
dat[, c(colsToConvert) := lapply(.SD, as.character), .SDcols = colsToConvert]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.