I have data that looks like this:
ID | Name | Role | Status | Date |
---|---|---|---|---|
1 | John | GM | Current | 12.04.2021 |
1 | Ann | GM | Previous | 10.07.2020 |
1 | Mary | GM | Previous | 24.01.2017 |
2 | Ann | GM | Current | 12.04.2021 |
2 | Josef | GM | Previous | 02.07.2015 |
3 | Sophie | GM | Current | 12.04.2021 |
4 | Ben | GM | Current | 12.04.2021 |
4 | Lucas | GM | Previous | 30.07.2018 |
4 | Peter | GM | Previous | 18.04.2017 |
4 | Susan | GM | Previous | 16.09.2015 |
The ID is unique for each "business". First of all, I want to have 1 row for each ID, and then each date needs to generate a new column. The first date in each ID has to be in "Date1", second "Date2", etc. It's worth mentioning that my dataset can take different numbers of rows for each ID.
I will use this analysis to look at changes in General Managers (GM) for each business, therefore only interested in ID and Date.
My final dataset will look like as the table below:
ID | Date1 | Date2 | Date3 | Date4 |
---|---|---|---|---|
1 | 12.04.2021 | 10.07.2020 | 24.01.2017 | NA |
2 | 12.04.2021 | 02.07.2015 | NA | NA |
3 | 12.04.2021 | NA | NA | NA |
4 | 12.04.2021 | 30.07.2018 | 18.04.2017 | 16.09.2015 |
I have tried to search for previous cases with reshape in R studio, but have not found similar to mine. Can someone help me? Many thanks in advance!
Here is a data.table
approach
library(data.table)
DT <- fread("ID Name Role Status Date
1 John GM Current 12.04.2021
1 Ann GM Previous 10.07.2020
1 Mary GM Previous 24.01.2017
2 Ann GM Current 12.04.2021
2 Josef GM Previous 02.07.2015
3 Sophie GM Current 12.04.2021
4 Ben GM Current 12.04.2021
4 Lucas GM Previous 30.07.2018
4 Peter GM Previous 18.04.2017
4 Susan GM Previous 16.09.2015")
# summarise dates by id
ans <- DT[, .(dates = paste0(Date, collapse = "#")), by = ID]
# now split
ans[, paste0("Date", 1:length(tstrsplit(ans$dates, "#"))) :=
tstrsplit( dates, "#")][, dates := NULL][]
# ID Date1 Date2 Date3 Date4
# 1: 1 12.04.2021 10.07.2020 24.01.2017 <NA>
# 2: 2 12.04.2021 02.07.2015 <NA> <NA>
# 3: 3 12.04.2021 <NA> <NA> <NA>
# 4: 4 12.04.2021 30.07.2018 18.04.2017 16.09.2015
Here's a tidyverse solution:
library(tidyverse)
df <- data.frame(
ID = c(1, 1, 1, 2, 2),
Name = c("John", "Ann", "Mary", "Ann", "Joseph"),
Role = rep("GM", 5),
Status = c("Current", "Previous", "Previous", "Current", "Previous"),
Date = c("12.04.2021", "10.07.2020", "24.01.2017", "12.04.2021", "02.07.2015")
)
df
ID Name Role Status Date
1 1 John GM Current 12.04.2021
2 1 Ann GM Previous 10.07.2020
3 1 Mary GM Previous 24.01.2017
4 2 Ann GM Current 12.04.2021
5 2 Joseph GM Previous 02.07.2015
dfnew <- df %>%
dplyr::group_by(ID) %>%
dplyr::mutate(rownum = row_number()) %>%
dplyr::select(ID, rownum, Date) %>%
tidyr::pivot_wider(names_from = rownum, values_from = Date, names_glue = "Date{rownum}")
dfnew
# A tibble: 2 x 4
# Groups: ID [2]
ID Date1 Date2 Date3
<dbl> <chr> <chr> <chr>
1 1 12.04.2021 10.07.2020 24.01.2017
2 2 12.04.2021 02.07.2015 NA
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.