简体   繁体   English

按变量拆分列并创建新列R

[英]Split Column by variable and create new column R

I am trying to split the column below using answer First question . 我正在尝试使用答案第一个问题拆分下面的列。 For now I am creating the new column in the df by using the letter. 目前,我正在使用字母在df中创建新列。 I would like to use the Letter before the name as the new column name. 我想在名称前使用字母作为新的列名称。 In the case below G, D, W, C, UTIL. 在低于G,D,W,C,UTIL的情况下。 Since there are only 'spaces' between the category G and the names First Person , etc I am scratching my head as how I could go about seperating the Category G and both the first and last name and join them under the appropriate column. 由于类别G和名字First Person等之间仅存在“空格”,因此我为将类别G以及名字和姓氏分开并在适当的列中加入它们而努力。

library(stringr)

test <- data.frame(Lineup = c("G First Person D Another Last W Fake  Name C Test Another UTIL Another Test", "G Fake Name W Another Fake D Third person UTIL Another Name C Name Another "))

1 G First Person D Another Last W Fake Name C Test Another UTIL Another Test
2 G Fake Name W Another Fake D Third person UTIL Another Name C Name Another

test$G <- str_split_fixed(test$Lineup, " ", 2)

result: 结果:

G
G

Hopeful Result: 希望的结果:

     G             D            W              C             UTIL    
First Person  Another Last  Fake Name      Test Another  Another Test
Fake Name     Third Person  Another Fake   Name Another  Another Name

Here's one approach using tidyverse : 这是使用tidyverse的一种方法:

# example data
test <- data.frame(Lineup = c("G First Person D Another Last W Fake  Name C Test Another UTIL Another Test", 
                              "G Fake Name W Another Fake D Third person UTIL Another Name C Name Another "))

library(tidyverse)

# create a dataset of words and info about
# their initial row id
# whether they should be a column in our new dataset
# group to join on
dt_words = test %>%
  mutate(id = row_number()) %>%
  separate_rows(Lineup) %>%
  mutate(is_col = Lineup %in% c(LETTERS, "UTIL"),
         group = cumsum(is_col))

# get the corresponding values of your new dataset
dt_values = dt_words %>%
  filter(is_col == FALSE) %>%
  group_by(group, id) %>%
  summarise(values = paste0(Lineup, collapse = " "))

# get the columns of your new dataset
# join corresponding values
# reshape data
dt_words %>%
  filter(is_col == TRUE) %>%
  select(-is_col) %>%
  inner_join(dt_values, by=c("group","id")) %>%
  select(-group) %>%
  spread(Lineup, values) %>%
  select(-id)

#    C            D            G            UTIL            W
# 1  Test Another Another Last First Person Another Test    Fake Name
# 2 Name Another  Third person    Fake Name Another Name Another Fake

Note that the assumption here is that you'll always have a single capital letter to split your values and those capital letter will be your columns in the new dataset. 请注意 ,这里的假设是,您总是会有一个大写字母来拆分值,这些大写字母将成为新数据集中的列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM