如何根据命名载体列表进行分类（〜本体）

Question

Simply put, I have a data frame containing in each row an item type: 简而言之，我有一个数据框，每行包含一个项目类型：

df <- data.frame(
  item = 1:5,
  type = c("apple", "orange", "onion", "lettuce", "chicken")
)

I want to categorize each item into a hierarchically higher category, which is defined by the type, according to a list of possible types for each category. 我想根据每个类别的可能类型列表，将每个项目归类到由类型定义的更高层次的类别。 I know all the possible types (or can extract them with df$type %>% levels() ). 我知道所有可能的类型（或可以使用df$type %>% levels()提取它们）。

1) How should I structure the "ontology"/"dictionary" listing all possible values for each category? 1）我应该如何构造列出所有类别所有可能值的“本体” /“词典”？ I thought about a list of named lists, but I am not sure what would be the best way to do that. 我考虑过一个命名列表列表，但是我不确定什么是最好的方法。

ontology = c(
  "fruit" = c("apple", "orange", "banana"),
  "vegetable" = c("onion", "lettuce", "tomato"),
  "meat" = c("chicken", "beef")
)

2) How should I create a variable category in my data frame categorizing each type? 2）如何在数据框中创建将每种类型分类的变量category ？

# Basic attempt...
df %>%
  mutate(category = str_match(type %in% ontology))

Expected result: 预期结果：

df
# item    type  category
#    1   apple     fruit
#    2  orange     fruit
#    3   onion vegetable
#    4 lettuce vegetable
#    5 chicken      meat

Answer 1

Here is a base R method with match , unlist and gsub . 这是带有match ，unlist和gsub的基本R方法。

# flatten ontology list to named atomic vector where name is category with added digit
flat <- unlist(ontology)
# match position of df$type in flat ontology, pull out name, and remove numeric digit
df$category <- sub("\\d+$", "", names(flat)[match(df$type, flat)])
df
  item    type  category
1    1   apple     fruit
2    2  orange     fruit
3    3   onion vegetable
4    4 lettuce vegetable
5    5 chicken      meat

Answer 2

You could turn ontology into a lookup table: 您可以将ontology转换为查找表：

library(tidyverse)

df <- data.frame(
  item = 1:5,
  type = c("apple", "orange", "onion", "lettuce", "chicken")
)

lookup <- list(    # use list to avoid suffixes on names
    "fruit" = c("apple", "orange", "banana"),
    "vegetable" = c("onion", "lettuce", "tomato"),
    "meat" = c("chicken", "beef")
) %>% 
    imap(~set_names(rep_along(.x, .y), .x)) %>%    # reverse names and objects
    flatten_chr()    # simplify to character vector

lookup
#>       apple      orange      banana       onion     lettuce      tomato 
#>     "fruit"     "fruit"     "fruit" "vegetable" "vegetable" "vegetable" 
#>     chicken        beef 
#>      "meat"      "meat"

which makes categorizing just a matter of subsetting: 这使得分类仅是子集的问题：

df %>% mutate(category = lookup[type])
#>   item    type  category
#> 1    1   apple     fruit
#> 2    2  orange vegetable
#> 3    3   onion vegetable
#> 4    4 lettuce     fruit
#> 5    5 chicken     fruit

如何根据命名载体列表进行分类（〜本体）

问题描述

2 个解决方案

解决方案1
2 已采纳 2017-05-05 20:22:52

解决方案2
1 2017-05-05 20:37:41

如何根据命名载体列表进行分类（〜本体）

问题描述

2 个解决方案

解决方案1 2 已采纳 2017-05-05 20:22:52

解决方案2 1 2017-05-05 20:37:41

解决方案1
2 已采纳 2017-05-05 20:22:52

解决方案2
1 2017-05-05 20:37:41