I am trying to split the column below using answer First question . For now I am creating the new column in the df by using the letter. I would like to use the Letter before the name as the new column name. In the case below G, D, W, C, UTIL. Since there are only 'spaces' between the category G
and the names First Person
, etc I am scratching my head as how I could go about seperating the Category G
and both the first and last name and join them under the appropriate column.
library(stringr)
test <- data.frame(Lineup = c("G First Person D Another Last W Fake Name C Test Another UTIL Another Test", "G Fake Name W Another Fake D Third person UTIL Another Name C Name Another "))
1 G First Person D Another Last W Fake Name C Test Another UTIL Another Test
2 G Fake Name W Another Fake D Third person UTIL Another Name C Name Another
test$G <- str_split_fixed(test$Lineup, " ", 2)
result:
G
G
Hopeful Result:
G D W C UTIL
First Person Another Last Fake Name Test Another Another Test
Fake Name Third Person Another Fake Name Another Another Name
Here's one approach using tidyverse
:
# example data
test <- data.frame(Lineup = c("G First Person D Another Last W Fake Name C Test Another UTIL Another Test",
"G Fake Name W Another Fake D Third person UTIL Another Name C Name Another "))
library(tidyverse)
# create a dataset of words and info about
# their initial row id
# whether they should be a column in our new dataset
# group to join on
dt_words = test %>%
mutate(id = row_number()) %>%
separate_rows(Lineup) %>%
mutate(is_col = Lineup %in% c(LETTERS, "UTIL"),
group = cumsum(is_col))
# get the corresponding values of your new dataset
dt_values = dt_words %>%
filter(is_col == FALSE) %>%
group_by(group, id) %>%
summarise(values = paste0(Lineup, collapse = " "))
# get the columns of your new dataset
# join corresponding values
# reshape data
dt_words %>%
filter(is_col == TRUE) %>%
select(-is_col) %>%
inner_join(dt_values, by=c("group","id")) %>%
select(-group) %>%
spread(Lineup, values) %>%
select(-id)
# C D G UTIL W
# 1 Test Another Another Last First Person Another Test Fake Name
# 2 Name Another Third person Fake Name Another Name Another Fake
Note that the assumption here is that you'll always have a single capital letter to split your values and those capital letter will be your columns in the new dataset.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.