[英]R: A way to filter based on values in another table?
I have two tables that have the following structures.我有两个具有以下结构的表。 Table 1, which I will call the Summary Table, is a list of category-values with a count:
表 1,我将其称为汇总表,是一个包含计数的类别值列表:
Category![]() |
Value![]() |
Count![]() |
---|---|---|
Cat1![]() |
Val1 ![]() |
|
Cat1![]() |
Val2 ![]() |
|
Cat1![]() |
Val3 ![]() |
|
Cat2![]() |
Val1 ![]() |
|
Cat2![]() |
Val2 ![]() |
|
Cat3![]() |
Val1 ![]() |
|
Cat3![]() |
Val2 ![]() |
summary <- data.frame(Category = c('Cat1', 'Cat1', 'Cat1', 'Cat2', 'Cat2', 'Cat3', 'Cat3'),
Value = c('Val1', 'Val2', 'Val3', 'Val1', 'Val2', 'Val1', 'Val2'),
Count = c(NA,NA,NA,NA,NA,NA,NA))
I want to populate this table with counts gathered from Table 2, which we will call Raw Data Table, which has the following structure:我想用从表 2 收集的计数填充此表,我们将其称为原始数据表,它具有以下结构:
Entity![]() |
Cat1![]() |
Cat2![]() |
Cat3![]() |
---|---|---|---|
Ent1 ![]() |
Val1 ![]() |
Val1 ![]() |
Val2 ![]() |
Ent2 ![]() |
Val1 ![]() |
Val1 ![]() |
Val2 ![]() |
Ent3 ![]() |
Val2 ![]() |
Val2 ![]() |
Val1 ![]() |
Ent4 ![]() |
Val2 ![]() |
Val1 ![]() |
Val2 ![]() |
Ent5 ![]() |
Val3 ![]() |
Val1 ![]() |
Val2 ![]() |
Ent6 ![]() |
Val3 ![]() |
Val1 ![]() |
Val1 ![]() |
Ent7 ![]() |
Val3 ![]() |
Val2 ![]() |
Val2 ![]() |
rawdata <- data.frame(Entity = c('Ent1', 'Ent2', 'Ent3', 'Ent4', 'Ent5', 'Ent6', 'Ent7'),
Cat1 = c('Val1', 'Val1', 'Val2', 'Val2', 'Val3', 'Val3', 'Val3'),
Cat2 = c('Val1', 'Val1', 'Val2', 'Val1', 'Val1', 'Val1', 'Val2'),
Cat3 = c('Val2', 'Val2', 'Val1', 'Val2', 'Val2', 'Val1', 'Val2'))
I want to populate the "Count" column from the summary table with the appropriate counts for each category & value pair.我想用每个类别和值对的适当计数填充汇总表中的“计数”列。 Programmatically, what I would want to do would be to have a counter, go through the Raw Data Table and just update the count for each value.
以编程方式,我想做的是通过原始数据表创建一个计数器 go 并更新每个值的计数。 I think this would be very inefficient in R.
我认为这在 R 中效率很低。 What I thought I would do is filter for the values but because column names are not evaluated as variables, I am at a loss of how to do this.
我想我会做的是过滤值,但是因为列名没有被评估为变量,所以我不知道如何做到这一点。
What I have tried (and I what I think I want something like is):我尝试过的(我认为我想要的是):
library(dplyr)
summary$Count <- nrow(rawdata %>% filter(get(summary$Category) == get(summary$Value)))
This isn't working, however.但是,这行不通。 How do I get the filter to take values from another table?
如何让过滤器从另一个表中获取值?
We can reshape to 'long' format with pivot_longer
and use count
to get the frequency count我们可以使用
pivot_longer
重塑为“long”格式并使用count
来获取频率计数
library(dplyr)
library(tidyr)
rawdata %>%
pivot_longer(cols = -Entity, names_to = "Category", values_to = "Value") %>%
count(Category, Value)
-output -输出
# A tibble: 7 x 3
# Category Value n
# <chr> <chr> <int>
#1 Cat1 Val1 2
#2 Cat1 Val2 2
#3 Cat1 Val3 3
#4 Cat2 Val1 5
#5 Cat2 Val2 2
#6 Cat3 Val1 2
#7 Cat3 Val2 5
NOTE: pivot_longer
reshapes the data from the 'wide' format to 'long' format.注意:
pivot_longer
将数据从“宽”格式重塑为“长”格式。 By specifying the cols = -Entity
, it is converting the rest of the columns to 'long' format with the column name as "Category" specified by names_to
and the corresponding values as "Value" ( values_to
)通过指定
cols = -Entity
,它将列的 rest 转换为 'long' 格式,列名称为由names_to
指定的“Category”,对应的值为“Value”( values_to
)
Or using base R
with table
或使用带有
table
的base R
subset(as.data.frame(table(data.frame(Category =
names(rawdata)[-1][col(rawdata[-1])],
Value = unlist(rawdata[-1])))), Freq > 0)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.