[英]Split Column by variable and create new column R
I am trying to split the column below using answer First question . 我正在尝试使用答案第一个问题拆分下面的列。 For now I am creating the new column in the df by using the letter.
目前,我正在使用字母在df中创建新列。 I would like to use the Letter before the name as the new column name.
我想在名称前使用字母作为新的列名称。 In the case below G, D, W, C, UTIL.
在低于G,D,W,C,UTIL的情况下。 Since there are only 'spaces' between the category
G
and the names First Person
, etc I am scratching my head as how I could go about seperating the Category G
and both the first and last name and join them under the appropriate column. 由于类别
G
和名字First Person
等之间仅存在“空格”,因此我为将类别G
以及名字和姓氏分开并在适当的列中加入它们而努力。
library(stringr)
test <- data.frame(Lineup = c("G First Person D Another Last W Fake Name C Test Another UTIL Another Test", "G Fake Name W Another Fake D Third person UTIL Another Name C Name Another "))
1 G First Person D Another Last W Fake Name C Test Another UTIL Another Test
2 G Fake Name W Another Fake D Third person UTIL Another Name C Name Another
test$G <- str_split_fixed(test$Lineup, " ", 2)
result: 结果:
G
G
Hopeful Result: 希望的结果:
G D W C UTIL
First Person Another Last Fake Name Test Another Another Test
Fake Name Third Person Another Fake Name Another Another Name
Here's one approach using tidyverse
: 这是使用
tidyverse
的一种方法:
# example data
test <- data.frame(Lineup = c("G First Person D Another Last W Fake Name C Test Another UTIL Another Test",
"G Fake Name W Another Fake D Third person UTIL Another Name C Name Another "))
library(tidyverse)
# create a dataset of words and info about
# their initial row id
# whether they should be a column in our new dataset
# group to join on
dt_words = test %>%
mutate(id = row_number()) %>%
separate_rows(Lineup) %>%
mutate(is_col = Lineup %in% c(LETTERS, "UTIL"),
group = cumsum(is_col))
# get the corresponding values of your new dataset
dt_values = dt_words %>%
filter(is_col == FALSE) %>%
group_by(group, id) %>%
summarise(values = paste0(Lineup, collapse = " "))
# get the columns of your new dataset
# join corresponding values
# reshape data
dt_words %>%
filter(is_col == TRUE) %>%
select(-is_col) %>%
inner_join(dt_values, by=c("group","id")) %>%
select(-group) %>%
spread(Lineup, values) %>%
select(-id)
# C D G UTIL W
# 1 Test Another Another Last First Person Another Test Fake Name
# 2 Name Another Third person Fake Name Another Name Another Fake
Note that the assumption here is that you'll always have a single capital letter to split your values and those capital letter will be your columns in the new dataset. 请注意 ,这里的假设是,您总是会有一个大写字母来拆分值,这些大写字母将成为新数据集中的列。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.