从 R tibble 中的字符创建列表

Question

I have a tibble with a character column.我有一个带有字符列的小标题。 The character in each row is a set of words like this: "type:mytype,variable:myvariable,variable:myothervariable:asubvariableofthisothervariable".每行中的字符是一组像这样的单词：“type:mytype,variable:myvariable,variable:myothervariable:asubvariableofthisothervariable”。 Things like that.像这样的东西。 I want to either convert this into columns in my tibble (a column "type", a column "variable", and so on; but then I don't really know what to do with my 3rd level words), or convert it to a column list x, so that x has a structure of sublists: x$type, x$variable, x$variable$myothervariable.我想将其转换为我的 tibble 中的列（列“类型”、列“变量”等；但是我真的不知道如何处理我的 3 级单词），或者将其转换为列列表 x，因此 x 具有子列表的结构：x$type、x$variable、x$variable$myothervariable。

I'm not sure what is the best approach, but also, I don't know how to implement this two approaches that I suggest here.我不确定什么是最好的方法，而且我不知道如何实现我在这里建议的这两种方法。 I have to say that I have maximum 3 levels, and more 1st level words than "type" and "variable".我不得不说我最多有 3 个级别，并且比“类型”和“变量”更多的 1 级单词。

Small Reproducible Example:可重现的小例子：

df <- tibble()
df$id<- 1:3
df$keywords <- c(
  "type:novel,genre:humor:black,year:2010"
  "type:dictionary,language:english,type:bilingual,otherlang:french"
  "type:essay,topic:philosophy:purposeoflife,year:2005"
)

# expected would be in idea 1: 
colnames(df)
# n, keywords, type, genre, year, 
# language, otherlang, topic

# on idea 2:
colnames(df)
# n, keywords, keywords.as.list

Answer 1

We can use separate_rows from tidyr to split the 'keywords' column by , , then with cSplit , split the column 'keywords' into multiple columns at : , reshape to 'long' format with pivot_longer and then reshape back to 'wide' with pivot_wider我们可以使用tidyr中的separate_rows将“关键字”列拆分为, ，然后使用cSplit将“关键字”列拆分为:处的多个列，使用pivot_wider重塑为“long”格式，然后使用pivot_longer重塑回“wide”

library(dplyr)
library(tidyr)
library(data.table)
library(splitstackshape)
df %>% 
   separate_rows(keywords, sep=",") %>%
   cSplit("keywords", ":") %>% 
   pivot_longer(cols = keywords_2:keywords_3, values_drop_na = TRUE) %>% 
   select(-name) %>%
   mutate(rn = rowid(id, keywords_1)) %>%
   pivot_wider(names_from = keywords_1, values_from = value) %>%
   select(-rn) %>%
   type.convert(as.is = TRUE)

-output -输出

# A tibble: 6 x 7
#     id type       genre  year language otherlang topic        
#  <int> <chr>      <chr> <int> <chr>    <chr>     <chr>        
#1     1 novel      humor  2010 <NA>     <NA>      <NA>         
#2     1 <NA>       black    NA <NA>     <NA>      <NA>         
#3     2 dictionary <NA>     NA english  french    <NA>         
#4     2 bilingual  <NA>     NA <NA>     <NA>      <NA>         
#5     3 essay      <NA>   2005 <NA>     <NA>      philosophy   
#6     3 <NA>       <NA>     NA <NA>     <NA>      purposeoflife

data数据

df <- structure(list(id = 1:3, keywords = c("type:novel,genre:humor:black,year:2010", 
"type:dictionary,language:english,type:bilingual,otherlang:french", 
"type:essay,topic:philosophy:purposeoflife,year:2005")), row.names = c(NA, 
-3L), class = c("tbl_df", "tbl", "data.frame"))

从 R tibble 中的字符创建列表

问题描述

1 个解决方案

解决方案1
0 2020-12-20 19:43:20

data数据

从 R tibble 中的字符创建列表

问题描述

1 个解决方案

解决方案1 0 2020-12-20 19:43:20

data数据

解决方案1
0 2020-12-20 19:43:20