简体   繁体   中英

Collapse a dataframe, creating new columns with name being the unique values of another column, and value based on the value of another column? In R

I have a dataframe (will call it ' df ') with a decent amount of variables (numeric and characters). One of the columns hold the amount of water consumed by a specific person at a given time of the day, some other columns are not related at all. Each row represents an observation.

Assuming this is my dataframe (I've oversimplified it and invented the problem in terms of water consumption - stay hydrated - in order to make my question more clear):

df <- structure(list(Name = structure(c(1L, 1L, 1L, 3L, 3L, 2L, 3L, 
2L, 2L), .Label = c("Ana", "David", "Roger"), class = "factor"), 
    Time = structure(c(3L, 1L, 2L, 2L, 3L, 2L, 1L, 1L, 3L), .Label = c("afternoon", 
    "evening", "morning"), class = "factor"), Water_consumed = c(1, 
    0.75, 0.5, 0.7, 0.7, 0.2, 1.2, 1, 0.6)), class = "data.frame", row.names = c(NA, 
-9L))
### Name   Time      Water_consumed
### Ana    morning   1.00
### Ana    afternoon 0.75
### Ana    evening   0.50
### Roger  evening   0.70
### Roger  morning   0.70
### David  evening   0.20
### Roger  afternoon 1.20
### David  afternoon 1.00
### David  morning   0.60

I want to create n new columns ( n being the number of unique values present in the 'Time' column) with their names based on the value of the column 'Time', and their value based on the value of the column 'Water_consumed'. Having this, I'd like the redundant rows and columns to be depricated.

So I expect something like this as output, a dataframe that has been collapsed by 'Name', and where the old columns 'Time' and 'Water_consumed' have been deleted (as they are now redundant, since three new columns have been created that hold the same information).

### Name     Consumed_morning Consumed_afternoon Consumed_evening
### Ana      1.00             0.75               0.50
### Roger    0.70             1.20               0.70
### David    0.60             1.00               0.20

Thanks in advance. Really appreciate any help.

Using data.table :

setDT(df)
dcast(df, Name ~ paste0("Consumed_", Time), value.var = "Water_consumed")

    Name Consumed_afternoon Consumed_evening Consumed_morning
1:   Ana               0.75              0.5              1.0
2: David               1.00              0.2              0.6
3: Roger               1.20              0.7              0.7

You want to use the spread function to do this.

df <- spread(df, Time, Water_consumed)

columns <- colnames(df) 

n <- length(columns)

columns[2:n] <- paste("Consumed_", columns[2:n], sep = "")

Check cheatsheets to save you tone of time diving in stackoverflow, specialy tidyverse related ones. I think it is easier to understand than DataTable.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM