I have a database export of user ids and dates logged in.
structure(list(User.Id = c(2542573L, 2571394L, 2770912L, 2683246L,
2832110L, 2773277L), Days.Played = c("", "2020-01-15,2020-01-16,2020-01-21,2020-01-22",
"2020-06-29", "2020-04-19,2020-04-24,2020-04-29", "2020-09-04",
"2020-06-23"), row.names = c(NA,
6L), class = "data.frame")
|---------------------|------------------|
| id | logged_in |
|---------------------|------------------|
| a | 2019-11-21, |
| | 2019-11-22, |
| | 2019-11-23,|
| | 2019-11-24,|
| | 2019-11-25 |
|---------------------|------------------|
| b | |
|---------------------|------------------|
| c | 2019-11-21, |
| | 2019-11-22, |
|---------------------|------------------|
What I am trying to do is split the date column by "," so each date is in it's own column
I want it to look like the below where there is a loggedin.[a:zz] stretching as wide as the longest string in the database. This could go to 1000 or more.
|---------------------|------------------|------------------|
| id | logged_in.a | loggedin.b |
|---------------------|------------------|------------------|
| a | 2019-11-21, | 2019-11-22 |
| | | |
| | | |
| | | |
| | | |
|---------------------|------------------|------------------|
| b | | |
|---------------------|------------------|------------------|
| c | 2019-11-21, | |
| | | 2019-11-22, |
|---------------------|------------------|------------------|
I then plan on gathering the dataset into a tall file. The code I used is below but I have to define the col names. My issue is I don't know how many there will be.
require(tidyr)
test %>% transform(.,Days.Played=colsplit(Days.Played, pattern=",", names=c('a','b')))
Does anyone know how to get around this issue or have any suggestions?
I think this is what you are looking for:
library(tidyr)
df %>% separate_rows(Days.Played, sep = ",")
#> # A tibble: 11 x 2
#> User.Id Days.Played
#> <int> <chr>
#> 1 2542573 ""
#> 2 2571394 "2020-01-15"
#> 3 2571394 "2020-01-16"
#> 4 2571394 "2020-01-21"
#> 5 2571394 "2020-01-22"
#> 6 2770912 "2020-06-29"
#> 7 2683246 "2020-04-19"
#> 8 2683246 "2020-04-24"
#> 9 2683246 "2020-04-29"
#> 10 2832110 "2020-09-04"
#> 11 2773277 "2020-06-23"
where df
is:
df <- structure(list(User.Id = c(2542573L, 2571394L, 2770912L, 2683246L, 2832110L, 2773277L),
Days.Played = c("", "2020-01-15,2020-01-16,2020-01-21,2020-01-22", "2020-06-29", "2020-04-19,2020-04-24,2020-04-29", "2020-09-04", "2020-06-23")),
row.names = c(NA, 6L), class = "data.frame")
You can also try:
library(tidyverse)
#Data
df <- data.frame(id=c('a','b','c'),
logged_in=c('2019-11-21,2019-11-22,2019-11-23,2019-11-24,2019-11-25','','2019-11-21,2019-11-22,'),stringsAsFactors = F)
#Code
newdf <- df %>%
pivot_longer(-c(id)) %>%
separate_rows(value,sep=',') %>%
group_by(id) %>%
mutate(Var=paste0('logged.in.',row_number())) %>%
select(-name) %>%
pivot_wider(names_from = Var,values_from=value,values_fill='')
Output:
# A tibble: 3 x 6
# Groups: id [3]
id logged.in.1 logged.in.2 logged.in.3 logged.in.4 logged.in.5
<chr> <chr> <chr> <chr> <chr> <chr>
1 a "2019-11-21" "2019-11-22" "2019-11-23" "2019-11-24" "2019-11-25"
2 b "" "" "" "" ""
3 c "2019-11-21" "2019-11-22" "" "" ""
In base R
, we can use strsplit
with stack
out <- stack(setNames(strsplit(df$Days.Played, ","), df$User.Id))[2:1]
colnames(out) <- names(df)
-output
out
# User.Id Days.Played
#1 2571394 2020-01-15
#2 2571394 2020-01-16
#3 2571394 2020-01-21
#4 2571394 2020-01-22
#5 2770912 2020-06-29
#6 2683246 2020-04-19
#7 2683246 2020-04-24
#8 2683246 2020-04-29
#9 2832110 2020-09-04
#10 2773277 2020-06-23
df <- structure(list(User.Id = c(2542573L, 2571394L, 2770912L,
2683246L, 2832110L, 2773277L),
Days.Played = c("", "2020-01-15,2020-01-16,2020-01-21,2020-01-22",
"2020-06-29",
"2020-04-19,2020-04-24,2020-04-29", "2020-09-04", "2020-06-23")),
row.names = c(NA, 6L), class = "data.frame")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.