简体   繁体   中英

R: Having trouble with reshape() function in stats package

When there are multiple variables in a data.frame that need to be melted, I'm confused about how to make that work. Here's an example:

Data <- data.frame(SampleID = rep(1:10, each = 3), 
               TimePoint = rep(LETTERS[1:3], 10))
Data$File.ESIpos <- paste("20141031 Subject", Data$SampleID, "Point",
                     Data$TimePoint, "ESIpos")

Data$Date.ESIpos <- "20141031"

Data$File.ESIneg <- paste("20141030 Subject", Data$SampleID, "Point", 
                     Data$TimePoint, "ESIneg")
Data$Date.ESIneg <- "20141030"

Data$File.APCIpos <- paste("20141029 Subject", Data$SampleID, "Point", 
                     Data$TimePoint, "APCIpos")
Data$Date.APCIpos <- "20141029"

I would like that to be melted by both Date and File so that the new data.frame has the columns "SampleID, "TimePoint", a new column "Mode" (where the choices are ESIpos, ESIneg, and APCIpos), "File", and "Date". Here's the closest I've gotten with the reshape() function.

Data.long <- reshape(Data, 
                     varying = c("File.ESIpos", "Date.ESIpos",
                                 "File.ESIneg", "Date.ESIneg", 
                                 "File.APCIpos", "Date.APCIpos"),
                     idvar = c("SampleID", "TimePoint"),
                     ids = c("ESIpos", "ESIneg", "APCIpos"),
                     v.names = c("Date", "File"),
                     sep = ".",
                     direction = "long")

The output is a data.frame with the columns "SampleID", "TimePoint", "time" (which is "1", "2", or "3" for "ESIpos", "ESIneg", or "APCIpos"), "Date" and "File".

The first problem is that I don't see how to define a new "Mode" column. I can change the column "time" to be named "Mode", of course, but isn't there some way to tell it that the levels should be "ESIpos", "ESIneg", and "APCIpos" rather than 1, 2, 3? I thought I was doing that with ids = c("ESIpos"... , but clearly I'm not. Plus, I get the same output regardless of whether I include the ids = c("ESIpos"... line.

A second smaller issue is that regardless of whether I say v.names = c("Date", "File") or v.names = c("File", "Date") , the columns are always swapped, ie I get file names in the Date column and vice versa.

I think this is the reshape() command you're after

reshaped <- reshape(Data, direction = "long", varying = 3:8, 
                 times = c("ESIpos", "ESIneg", "ACPIpos"))
head(reshaped)
#          SampleID TimePoint   time                              File     Date id
# 1.ESIpos        1         A ESIpos 20141031 Subject 1 Point A ESIpos 20141031  1
# 2.ESIpos        1         B ESIpos 20141031 Subject 1 Point B ESIpos 20141031  2
# 3.ESIpos        1         C ESIpos 20141031 Subject 1 Point C ESIpos 20141031  3
# 4.ESIpos        2         A ESIpos 20141031 Subject 2 Point A ESIpos 20141031  4
# 5.ESIpos        2         B ESIpos 20141031 Subject 2 Point B ESIpos 20141031  5
# 6.ESIpos        2         C ESIpos 20141031 Subject 2 Point C ESIpos 20141031  6

I always give up on reshape due to migraines, but I am always amazed when someone uses it and it works, so I'd like to see a solution using it. So that said, you could use reshape2::melt twice and combine the results:

library(reshape2)
vars <- c('SampleID','TimePoint','Mode')
m1 <- melt(Data, id.vars = c(vars[1:2], names(Data)[grep('File', names(Data))]),
           variable.name = 'Mode', value.name = 'Date')[c(vars, 'Date')]
m2 <- melt(Data, id.vars = c(vars[1:2], names(Data)[grep('Date', names(Data))]),
           variable.name = 'Mode', value.name = 'File')[c(vars, 'File')]

m1$Mode <- gsub('Date.', '', m1$Mode)
m2$Mode <- gsub('File.', '', m2$Mode)

identical(m1[1:3], m2[1:3])
# [1] TRUE

Data.long <- cbind(m1, m2['File'])

head(Data.long[with(Data.long, order(SampleID, TimePoint)), ])

#    SampleID TimePoint    Mode     Date                               File
# 1         1         A  ESIpos 20141031  20141031 Subject 1 Point A ESIpos
# 31        1         A  ESIneg 20141030  20141030 Subject 1 Point A ESIneg
# 61        1         A APCIpos 20141029 20141029 Subject 1 Point A APCIpos
# 2         1         B  ESIpos 20141031  20141031 Subject 1 Point B ESIpos
# 32        1         B  ESIneg 20141030  20141030 Subject 1 Point B ESIneg
# 62        1         B APCIpos 20141029 20141029 Subject 1 Point B APCIpos

Or do something similar with stats::reshape

Here's how I'd tackle the problem with tidyr:

library(tidyr)

Data %>%
  # Gather all columns except SampleID and TimePoint 
  # (since they're already variables)
  gather(key, value, -SampleID, -TimePoint) %>% 
  # Separate the key into components type and mode
  separate(key, c("type", "mode"), "\\.") %>%
  # Spread the type back into the columns
  spread(type, value)
#>    SampleID TimePoint    mode     Date                                File
#> 1         1         A APCIpos 20141029  20141029 Subject 1 Point A APCIpos
#> 2         1         A  ESIneg 20141030   20141030 Subject 1 Point A ESIneg
#> 3         1         A  ESIpos 20141031   20141031 Subject 1 Point A ESIpos
#> 4         1         B APCIpos 20141029  20141029 Subject 1 Point B APCIpos
#> 5         1         B  ESIneg 20141030   20141030 Subject 1 Point B ESIneg
#> 6         1         B  ESIpos 20141031   20141031 Subject 1 Point B ESIpos
#> 7         1         C APCIpos 20141029  20141029 Subject 1 Point C APCIpos
#...

To figure out how to come up with these steps yourself, I'd recommend reading Tidy Data , which lays out a framework that should help you understand the problem better.

melt.data.table in v1.9.5 can now melt into multiple columns. With that, we can do:

require(data.table) ## v1.9.5
ans = melt(setDT(Data), id=c("SampleID", "TimePoint"), 
      measure=list(c(3,5,7), c(4,6,8)), value.name=c("File", "Date"))
setattr(ans$variable, 'levels', 
        unique(gsub(".*[.]", "", names(Data)[-(1:2)])))
#   SampleID TimePoint variable                                File     Date
# 1:        1         A   ESIpos   20141031 Subject 1 Point A ESIpos 20141031
# 2:        1         B   ESIpos   20141031 Subject 1 Point B ESIpos 20141031
# 3:        1         C   ESIpos   20141031 Subject 1 Point C ESIpos 20141031
# 4:        2         A   ESIpos   20141031 Subject 2 Point A ESIpos 20141031
# 5:        2         B   ESIpos   20141031 Subject 2 Point B ESIpos 20141031
# 6:        2         C   ESIpos   20141031 Subject 2 Point C ESIpos 20141031
# ...

You can get the development version from here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM