[英]create list from characters in R tibble
I have a tibble with a character column.我有一个带有字符列的小标题。 The character in each row is a set of words like this: "type:mytype,variable:myvariable,variable:myothervariable:asubvariableofthisothervariable".
每行中的字符是一组像这样的单词:“type:mytype,variable:myvariable,variable:myothervariable:asubvariableofthisothervariable”。 Things like that.
像这样的东西。 I want to either convert this into columns in my tibble (a column "type", a column "variable", and so on; but then I don't really know what to do with my 3rd level words), or convert it to a column list x, so that x has a structure of sublists: x$type, x$variable, x$variable$myothervariable.
我想将其转换为我的 tibble 中的列(列“类型”、列“变量”等;但是我真的不知道如何处理我的 3 级单词),或者将其转换为列列表 x,因此 x 具有子列表的结构:x$type、x$variable、x$variable$myothervariable。
I'm not sure what is the best approach, but also, I don't know how to implement this two approaches that I suggest here.我不确定什么是最好的方法,而且我不知道如何实现我在这里建议的这两种方法。 I have to say that I have maximum 3 levels, and more 1st level words than "type" and "variable".
我不得不说我最多有 3 个级别,并且比“类型”和“变量”更多的 1 级单词。
Small Reproducible Example:可重现的小例子:
df <- tibble()
df$id<- 1:3
df$keywords <- c(
"type:novel,genre:humor:black,year:2010"
"type:dictionary,language:english,type:bilingual,otherlang:french"
"type:essay,topic:philosophy:purposeoflife,year:2005"
)
# expected would be in idea 1:
colnames(df)
# n, keywords, type, genre, year,
# language, otherlang, topic
# on idea 2:
colnames(df)
# n, keywords, keywords.as.list
We can use separate_rows
from tidyr
to split the 'keywords' column by ,
, then with cSplit
, split the column 'keywords' into multiple columns at :
, reshape to 'long' format with pivot_longer
and then reshape back to 'wide' with pivot_wider
我们可以使用
tidyr
中的separate_rows
将“关键字”列拆分为,
,然后使用cSplit
将“关键字”列拆分为:
处的多个列,使用pivot_wider
重塑为“long”格式,然后使用pivot_longer
重塑回“wide”
library(dplyr)
library(tidyr)
library(data.table)
library(splitstackshape)
df %>%
separate_rows(keywords, sep=",") %>%
cSplit("keywords", ":") %>%
pivot_longer(cols = keywords_2:keywords_3, values_drop_na = TRUE) %>%
select(-name) %>%
mutate(rn = rowid(id, keywords_1)) %>%
pivot_wider(names_from = keywords_1, values_from = value) %>%
select(-rn) %>%
type.convert(as.is = TRUE)
-output -输出
# A tibble: 6 x 7
# id type genre year language otherlang topic
# <int> <chr> <chr> <int> <chr> <chr> <chr>
#1 1 novel humor 2010 <NA> <NA> <NA>
#2 1 <NA> black NA <NA> <NA> <NA>
#3 2 dictionary <NA> NA english french <NA>
#4 2 bilingual <NA> NA <NA> <NA> <NA>
#5 3 essay <NA> 2005 <NA> <NA> philosophy
#6 3 <NA> <NA> NA <NA> <NA> purposeoflife
df <- structure(list(id = 1:3, keywords = c("type:novel,genre:humor:black,year:2010",
"type:dictionary,language:english,type:bilingual,otherlang:french",
"type:essay,topic:philosophy:purposeoflife,year:2005")), row.names = c(NA,
-3L), class = c("tbl_df", "tbl", "data.frame"))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.