简体   繁体   中英

Reshape dataframe in R, different dates

I have data that looks like this:

ID Name Role Status Date
1 John GM Current 12.04.2021
1 Ann GM Previous 10.07.2020
1 Mary GM Previous 24.01.2017
2 Ann GM Current 12.04.2021
2 Josef GM Previous 02.07.2015
3 Sophie GM Current 12.04.2021
4 Ben GM Current 12.04.2021
4 Lucas GM Previous 30.07.2018
4 Peter GM Previous 18.04.2017
4 Susan GM Previous 16.09.2015

The ID is unique for each "business". First of all, I want to have 1 row for each ID, and then each date needs to generate a new column. The first date in each ID has to be in "Date1", second "Date2", etc. It's worth mentioning that my dataset can take different numbers of rows for each ID.

I will use this analysis to look at changes in General Managers (GM) for each business, therefore only interested in ID and Date.

My final dataset will look like as the table below:

ID Date1 Date2 Date3 Date4
1 12.04.2021 10.07.2020 24.01.2017 NA
2 12.04.2021 02.07.2015 NA NA
3 12.04.2021 NA NA NA
4 12.04.2021 30.07.2018 18.04.2017 16.09.2015

I have tried to search for previous cases with reshape in R studio, but have not found similar to mine. Can someone help me? Many thanks in advance!

Here is a data.table approach

library(data.table)

DT <- fread("ID     Name    Role    Status  Date
1   John    GM  Current     12.04.2021
1   Ann     GM  Previous    10.07.2020
1   Mary    GM  Previous    24.01.2017
2   Ann     GM  Current     12.04.2021
2   Josef   GM  Previous    02.07.2015
3   Sophie  GM  Current     12.04.2021
4   Ben     GM  Current     12.04.2021
4   Lucas   GM  Previous    30.07.2018
4   Peter   GM  Previous    18.04.2017
4   Susan   GM  Previous    16.09.2015")

# summarise dates by id
ans <- DT[, .(dates = paste0(Date, collapse = "#")), by = ID]
# now split
ans[, paste0("Date", 1:length(tstrsplit(ans$dates, "#"))) := 
      tstrsplit( dates, "#")][, dates := NULL][]

#    ID      Date1      Date2      Date3      Date4
# 1:  1 12.04.2021 10.07.2020 24.01.2017       <NA>
# 2:  2 12.04.2021 02.07.2015       <NA>       <NA>
# 3:  3 12.04.2021       <NA>       <NA>       <NA>
# 4:  4 12.04.2021 30.07.2018 18.04.2017 16.09.2015

Here's a tidyverse solution:

library(tidyverse)

df <- data.frame(
  ID = c(1, 1, 1, 2, 2),
  Name = c("John", "Ann", "Mary", "Ann", "Joseph"),
  Role = rep("GM", 5),
  Status = c("Current", "Previous", "Previous", "Current", "Previous"),
  Date = c("12.04.2021", "10.07.2020", "24.01.2017", "12.04.2021", "02.07.2015")
)

df

  ID   Name Role   Status       Date
1  1   John   GM  Current 12.04.2021
2  1    Ann   GM Previous 10.07.2020
3  1   Mary   GM Previous 24.01.2017
4  2    Ann   GM  Current 12.04.2021
5  2 Joseph   GM Previous 02.07.2015


dfnew <- df %>% 
  dplyr::group_by(ID) %>% 
  dplyr::mutate(rownum = row_number()) %>% 
  dplyr::select(ID, rownum, Date) %>% 
  tidyr::pivot_wider(names_from = rownum, values_from = Date, names_glue = "Date{rownum}")

dfnew

# A tibble: 2 x 4
# Groups:   ID [2]
     ID Date1      Date2      Date3     
  <dbl> <chr>      <chr>      <chr>     
1     1 12.04.2021 10.07.2020 24.01.2017
2     2 12.04.2021 02.07.2015 NA   

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM