I have a dataframe (will call it ' df ') with a decent amount of variables (numeric and characters). One of the columns hold the amount of water consumed by a specific person at a given time of the day, some other columns are not related at all. Each row represents an observation.
Assuming this is my dataframe (I've oversimplified it and invented the problem in terms of water consumption - stay hydrated - in order to make my question more clear):
df <- structure(list(Name = structure(c(1L, 1L, 1L, 3L, 3L, 2L, 3L,
2L, 2L), .Label = c("Ana", "David", "Roger"), class = "factor"),
Time = structure(c(3L, 1L, 2L, 2L, 3L, 2L, 1L, 1L, 3L), .Label = c("afternoon",
"evening", "morning"), class = "factor"), Water_consumed = c(1,
0.75, 0.5, 0.7, 0.7, 0.2, 1.2, 1, 0.6)), class = "data.frame", row.names = c(NA,
-9L))
### Name Time Water_consumed
### Ana morning 1.00
### Ana afternoon 0.75
### Ana evening 0.50
### Roger evening 0.70
### Roger morning 0.70
### David evening 0.20
### Roger afternoon 1.20
### David afternoon 1.00
### David morning 0.60
I want to create n new columns ( n being the number of unique values present in the 'Time' column) with their names based on the value of the column 'Time', and their value based on the value of the column 'Water_consumed'. Having this, I'd like the redundant rows and columns to be depricated.
So I expect something like this as output, a dataframe that has been collapsed by 'Name', and where the old columns 'Time' and 'Water_consumed' have been deleted (as they are now redundant, since three new columns have been created that hold the same information).
### Name Consumed_morning Consumed_afternoon Consumed_evening
### Ana 1.00 0.75 0.50
### Roger 0.70 1.20 0.70
### David 0.60 1.00 0.20
Thanks in advance. Really appreciate any help.
Using data.table
:
setDT(df)
dcast(df, Name ~ paste0("Consumed_", Time), value.var = "Water_consumed")
Name Consumed_afternoon Consumed_evening Consumed_morning
1: Ana 0.75 0.5 1.0
2: David 1.00 0.2 0.6
3: Roger 1.20 0.7 0.7
You want to use the spread
function to do this.
df <- spread(df, Time, Water_consumed)
columns <- colnames(df)
n <- length(columns)
columns[2:n] <- paste("Consumed_", columns[2:n], sep = "")
Check cheatsheets to save you tone of time diving in stackoverflow, specialy tidyverse related ones. I think it is easier to understand than DataTable.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.