简体   繁体   中英

R - Using reshape() to turn data frame into a two column matrix

I've tried using many variations of the reshape() function (reshape2 package) to turn a data frame of different factors into a two column matrix, but I've been unsuccessful. I also need to ignore blanks. Here is a simplified example of what I'm trying to accomplish:

Code Bucket1 Bucket2

1 Green Blue

2 Green (Blank)

3 (Blank) (Blank)

4 (Blank) Blue

INTO:

Code Bucket

1 Green

1 Blue

2 Green

4 Blue

Can anybody help with reshape()?

The reshape2 package contains a melt function this is for converting datasets from wide to long format. There is a reshape function, which is part of the **stats* package, which is also useful for reshaping data but is not a function I know well.

To reshape data that you've described, you can use melt as follows. I'm guessing your blanks are NA , so I use the na.rm argument to remove them. I use the value.name argument to name the new column that is created.

melt(dat, id.vars = "Code", na.rm = TRUE, value.name = "Bucket")

Result:

  Code variable Bucket
1    1  Bucket1  Green
2    2  Bucket1  Green
5    1  Bucket2   Blue
8    4  Bucket2   Blue

This doesn't give the exact output you requested, as you want your final dataset in a specific order and without the new variable column. You can use some of the handy functions from the dplyr package to remove the extra column (using select ) and order by Code (using arrange ), although there are certainly other ways to manipulate the result after melt ing.

require(dplyr)
dat %>% 
    melt(id.vars = "Code", na.rm = TRUE, value.name = "Bucket") %>%
    select(-variable) %>%
    arrange(Code)

Now the result looks like:

  Code Bucket 1 1 Green 2 1 Blue 3 2 Green 4 4 Blue 
  library(data.table)

  dat <- as.data.table(your_original_data.frame)

  dat[, c(Bucket1, Bucket2), by=Code]
     Code    V1
  1:    1 Green
  2:    1  Blue
  3:    2 Green
  4:    2    NA
  5:    3    NA
  6:    3    NA
  7:    4    NA
  8:    4  Blue

  ## To drop the NA's 
  dat[, c(Bucket1, Bucket2), by=Code][!is.na(V1)]
     Code    V1
  1:    1 Green
  2:    1  Blue
  3:    2 Green
  4:    4  Blue

  ## if they are actually called "(Blank)" use 
  dat[, c(Bucket1, Bucket2), by=Code][V1 != "(Blank)"]

update: To convert your factors to characters:

   colsToConvert <- setdiff(names(dat), "Code") # or manually type them 
   dat[, c(colsToConvert) := lapply(.SD, as.character), .SDcols = colsToConvert]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM