[英]pattern matching R
ca.df
id Category
1 Noun
2 Negative
3 Positive
4 adj
5 word
每个术语分配到多个类别,因此,它对应多于1个ID。 在术语中,所有ID都在一列中。
terms.df
Terms id
Love 1 4 5 3
Hate 2 4 5
ice 1 5
id的含义与ca.df中的类别相对应。 我想要一个像这样的输出:
x.df
Category terms
Noun ice Love
Negative Hate
Positive Love
adj Hate Love
word ice Hate Love
这个怎么做?
这是一个可能的data.table
/ splitstackshape
包解决方案
library(splitstackshape) ## loads `data.table` package too
terms.df <- cSplit(terms.df, "id", sep = " ", direction = "long")
setkey(terms.df, id)[ca.df, .(Category , Terms = toString(Terms)), by = .EACHI]
# id Category Terms
# 1: 1 Noun Love, ice
# 2: 2 Negative Hate
# 3: 3 Positive Love
# 4: 4 adj Love, Hate
# 5: 5 word Love, Hate, ice
一些解释
Terms
列将id
列拆分为空格 id
列的两个数据集之间执行二进制左连接 by = .EACHI
运算符根据每个联接将Terms
列连接回来,这允许我们在joinig时执行不同的操作 使用tidyr
和dplyr
解决方案。
library(tidyr)
library(dplyr)
ca.df$id <- as.character(ca.df$id)
terms.df %>% separate(id,into=paste0("V",1:3),sep = " ",extra = "merge") %>%
gather(var,id,-Terms) %>%
filter(!is.na(id)) %>%
left_join(ca.df,by="id") %>%
select(-var,-id) %>%
group_by(Category) %>%
summarize(Terms=paste(Terms,collapse=" "))
输出:
Source: local data frame [4 x 2]
Category Terms
1 Negative Hate
2 Noun Love ice
3 adj Love Hate
4 word ice Love Hate
数据:
ca.df <- read.table(text =
"id Category
1 Noun
2 Negative
3 Positive
4 adj
5 word",head=TRUE,stringsAsFactors=FALSE)
terms.df <- read.table(text =
"Terms id
Love '1 4 5'
Hate '2 4 5'
ice '1 5'
",head=TRUE,stringsAsFactors=FALSE)
您可以使用merge
基于id进行组合
ca.df <- data.frame(id=1:5, Category=c("Noun", "Negative", "Positive", "adj", "word"))
terms.df <- data.frame(Terms=c(rep("Love", 3), rep("Hate", 3), rep("ice", 2)),
id = c(1,4,5,2,4,5,1,5))
x.df <- merge(ca.df, terms.df, by="id")
x.df
id Category Terms
1 1 Noun Love
2 1 Noun ice
3 2 Negative Hate
4 4 adj Love
5 4 adj Hate
6 5 word Love
7 5 word Hate
8 5 word ice
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.