简体   繁体   中英

Split Column by variable and create new column R

I am trying to split the column below using answer First question . For now I am creating the new column in the df by using the letter. I would like to use the Letter before the name as the new column name. In the case below G, D, W, C, UTIL. Since there are only 'spaces' between the category G and the names First Person , etc I am scratching my head as how I could go about seperating the Category G and both the first and last name and join them under the appropriate column.

library(stringr)

test <- data.frame(Lineup = c("G First Person D Another Last W Fake  Name C Test Another UTIL Another Test", "G Fake Name W Another Fake D Third person UTIL Another Name C Name Another "))

1 G First Person D Another Last W Fake Name C Test Another UTIL Another Test
2 G Fake Name W Another Fake D Third person UTIL Another Name C Name Another

test$G <- str_split_fixed(test$Lineup, " ", 2)

result:

G
G

Hopeful Result:

     G             D            W              C             UTIL    
First Person  Another Last  Fake Name      Test Another  Another Test
Fake Name     Third Person  Another Fake   Name Another  Another Name

Here's one approach using tidyverse :

# example data
test <- data.frame(Lineup = c("G First Person D Another Last W Fake  Name C Test Another UTIL Another Test", 
                              "G Fake Name W Another Fake D Third person UTIL Another Name C Name Another "))

library(tidyverse)

# create a dataset of words and info about
# their initial row id
# whether they should be a column in our new dataset
# group to join on
dt_words = test %>%
  mutate(id = row_number()) %>%
  separate_rows(Lineup) %>%
  mutate(is_col = Lineup %in% c(LETTERS, "UTIL"),
         group = cumsum(is_col))

# get the corresponding values of your new dataset
dt_values = dt_words %>%
  filter(is_col == FALSE) %>%
  group_by(group, id) %>%
  summarise(values = paste0(Lineup, collapse = " "))

# get the columns of your new dataset
# join corresponding values
# reshape data
dt_words %>%
  filter(is_col == TRUE) %>%
  select(-is_col) %>%
  inner_join(dt_values, by=c("group","id")) %>%
  select(-group) %>%
  spread(Lineup, values) %>%
  select(-id)

#    C            D            G            UTIL            W
# 1  Test Another Another Last First Person Another Test    Fake Name
# 2 Name Another  Third person    Fake Name Another Name Another Fake

Note that the assumption here is that you'll always have a single capital letter to split your values and those capital letter will be your columns in the new dataset.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM