简体   繁体   中英

Date variable incorrectly converted into a numeric variable following the conversion of the data frame from a long format to a wide format

I am working with a data frame that consists of participants' data across multiple timepoints. I am attempting to convert the data frame from a long format to a wide format. The data frame consists of variables belonging to different data types such as date and numerics.

library(data.table)

SN <- c("AAA", "BBB", "BBB", "CCC", "DDD", "EEE", "DDD")
Timepoint <- c(1, 1, 2, 1, 1, 1, 2)
date <- c("31-Mar-17", "08-Mar-17", "31-Mar-18", "28-Mar-18", "17-Mar-17", "26-Feb-18", "07-Apr-18")
score <- c(13, 16, 17, 9, 14, 15, 15)
age <- c(12, 15, 16, 9, 14, 14, 15)
df <- data.frame(SN, Timepoint, date, score, age)
df$date <- as.Date(df$date, format = "%d-%B-%y")

I used the following code to convert the data from a long format to a wide format:

df2 <- dcast(melt(df, id.vars = c("SN", "Timepoint")),
         SN ~ Timepoint + variable, value.var = "value")

As R interprets all variables to belong to a common type (numeric), the date variable has been incorrectly converted to a numeric variable.

The following is the incorrect output I have obtained:

输出不正确

The correct output which I am trying to achieve is as follow:

正确的输出

Thanks! Help much appreciated!

We may need to have 'date' also in the id.vars as the 'value' column in numeric and by mixing two classes, it converts to a single one ie the numeric one. Instead, if we have two separate columns and make use of the value.var from data.table::dcast (takes more than one variable)

dcast(melt(setDT(df), id.vars = c("SN", "Timepoint", "date")),
      SN ~ Timepoint + variable, value.var = c("date", "value"))

Based on the expected output, we may only need dcast

dcast(setDT(df)[], SN ~ Timepoint, value.var = c('date', 'score', 'age'))

Another way to do this is by converting the date variables to a date after the data is restructured, instead of before. The dplyr package makes it easy to change all columns based on the name, such as columns that end with date.

library(data.table)
library(dplyr)

SN <- c("AAA", "BBB", "BBB", "CCC", "DDD", "EEE", "DDD")
Timepoint <- c(1, 1, 2, 1, 1, 1, 2)
date <- c("31-Mar-17", "08-Mar-17", "31-Mar-18", "28-Mar-18", "17-Mar-17", "26-Feb-18", "07-Apr-18")
score <- c(13, 16, 17, 9, 14, 15, 15)
age <- c(12, 15, 16, 9, 14, 14, 15)
df <- data.frame(SN, Timepoint, date, score, age)
df2 <- dcast(melt(df, id.vars = c("SN", "Timepoint")),
             SN ~ Timepoint + variable, value.var = "value") %>% 
  mutate_at(vars(ends_with("date")), as.Date, format = "%d-%B-%y")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM