简体   繁体   中英

Split a string by a eliminator into an infinite number of columns

I have a database export of user ids and dates logged in.

structure(list(User.Id = c(2542573L, 2571394L, 2770912L, 2683246L, 
2832110L, 2773277L),  Days.Played = c("", "2020-01-15,2020-01-16,2020-01-21,2020-01-22", 
"2020-06-29", "2020-04-19,2020-04-24,2020-04-29", "2020-09-04", 
"2020-06-23"), row.names = c(NA, 
6L), class = "data.frame")
|---------------------|------------------|
|        id           |    logged_in     |
|---------------------|------------------| 
|         a           |     2019-11-21,  |
|                     |      2019-11-22, |
|                     |       2019-11-23,|
|                     |       2019-11-24,|
|                     |       2019-11-25 |
|---------------------|------------------|
|         b           |                  |
|---------------------|------------------|
|         c           | 2019-11-21,      |
|                     |   2019-11-22,    |
|---------------------|------------------|

What I am trying to do is split the date column by "," so each date is in it's own column

I want it to look like the below where there is a loggedin.[a:zz] stretching as wide as the longest string in the database. This could go to 1000 or more.


|---------------------|------------------|------------------|
|        id           |    logged_in.a   |    loggedin.b    |
|---------------------|------------------|------------------|
|         a           |     2019-11-21,  |     2019-11-22   |
|                     |                  |                  |
|                     |                  |                  |
|                     |                  |                  |
|                     |                  |                  |
|---------------------|------------------|------------------|
|         b           |                  |                  |
|---------------------|------------------|------------------|
|         c           | 2019-11-21,      |                  |
|                     |                  |    2019-11-22,   |
|---------------------|------------------|------------------|

I then plan on gathering the dataset into a tall file. The code I used is below but I have to define the col names. My issue is I don't know how many there will be.

require(tidyr)

test %>% transform(.,Days.Played=colsplit(Days.Played, pattern=",", names=c('a','b')))

Does anyone know how to get around this issue or have any suggestions?

I think this is what you are looking for:

library(tidyr)
df %>% separate_rows(Days.Played, sep = ",") 
#> # A tibble: 11 x 2
#>    User.Id Days.Played 
#>      <int> <chr>       
#>  1 2542573 ""          
#>  2 2571394 "2020-01-15"
#>  3 2571394 "2020-01-16"
#>  4 2571394 "2020-01-21"
#>  5 2571394 "2020-01-22"
#>  6 2770912 "2020-06-29"
#>  7 2683246 "2020-04-19"
#>  8 2683246 "2020-04-24"
#>  9 2683246 "2020-04-29"
#> 10 2832110 "2020-09-04"
#> 11 2773277 "2020-06-23"

where df is:

df <- structure(list(User.Id = c(2542573L, 2571394L, 2770912L, 2683246L, 2832110L, 2773277L),
                     Days.Played = c("", "2020-01-15,2020-01-16,2020-01-21,2020-01-22", "2020-06-29", "2020-04-19,2020-04-24,2020-04-29", "2020-09-04", "2020-06-23")), 
                row.names = c(NA, 6L), class = "data.frame")

You can also try:

library(tidyverse)
#Data
df <- data.frame(id=c('a','b','c'),
                 logged_in=c('2019-11-21,2019-11-22,2019-11-23,2019-11-24,2019-11-25','','2019-11-21,2019-11-22,'),stringsAsFactors = F)
#Code
newdf <- df %>%
  pivot_longer(-c(id)) %>%
  separate_rows(value,sep=',') %>%
  group_by(id) %>%
  mutate(Var=paste0('logged.in.',row_number())) %>%
  select(-name) %>%
  pivot_wider(names_from = Var,values_from=value,values_fill='')

Output:

# A tibble: 3 x 6
# Groups:   id [3]
  id    logged.in.1  logged.in.2  logged.in.3  logged.in.4  logged.in.5 
  <chr> <chr>        <chr>        <chr>        <chr>        <chr>       
1 a     "2019-11-21" "2019-11-22" "2019-11-23" "2019-11-24" "2019-11-25"
2 b     ""           ""           ""           ""           ""          
3 c     "2019-11-21" "2019-11-22" ""           ""           ""          

In base R , we can use strsplit with stack

out <- stack(setNames(strsplit(df$Days.Played, ","), df$User.Id))[2:1]
colnames(out) <- names(df)

-output

out
#   User.Id Days.Played
#1  2571394  2020-01-15
#2  2571394  2020-01-16
#3  2571394  2020-01-21
#4  2571394  2020-01-22
#5  2770912  2020-06-29
#6  2683246  2020-04-19
#7  2683246  2020-04-24
#8  2683246  2020-04-29
#9  2832110  2020-09-04
#10 2773277  2020-06-23

data

df <- structure(list(User.Id = c(2542573L, 2571394L, 2770912L, 
   2683246L, 2832110L, 2773277L),
   
                     Days.Played = c("", "2020-01-15,2020-01-16,2020-01-21,2020-01-22", 
        "2020-06-29", 
       
  "2020-04-19,2020-04-24,2020-04-29", "2020-09-04", "2020-06-23")), 
                row.names = c(NA, 6L), class = "data.frame")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM