R：一种基于另一个表中的值进行过滤的方法？

Question

I have two tables that have the following structures.我有两个具有以下结构的表。 Table 1, which I will call the Summary Table, is a list of category-values with a count:表 1，我将其称为汇总表，是一个包含计数的类别值列表：

Category类别	Value价值	Count数数
Cat1第一类	Val1 VAL1
Cat1第一类	Val2 VAL2
Cat1第一类	Val3 VAL3
Cat2类别 2	Val1 VAL1
Cat2类别 2	Val2 VAL2
Cat3三类	Val1 VAL1
Cat3三类	Val2 VAL2

summary <- data.frame(Category = c('Cat1', 'Cat1', 'Cat1', 'Cat2', 'Cat2', 'Cat3', 'Cat3'),
                      Value = c('Val1', 'Val2', 'Val3', 'Val1', 'Val2', 'Val1', 'Val2'),
                      Count = c(NA,NA,NA,NA,NA,NA,NA))

I want to populate this table with counts gathered from Table 2, which we will call Raw Data Table, which has the following structure:我想用从表 2 收集的计数填充此表，我们将其称为原始数据表，它具有以下结构：

Entity实体	Cat1第一类	Cat2类别 2	Cat3三类
Ent1 Ent1	Val1 VAL1	Val1 VAL1	Val2 VAL2
Ent2 Ent2	Val1 VAL1	Val1 VAL1	Val2 VAL2
Ent3 Ent3	Val2 VAL2	Val2 VAL2	Val1 VAL1
Ent4 Ent4	Val2 VAL2	Val1 VAL1	Val2 VAL2
Ent5 Ent5	Val3 VAL3	Val1 VAL1	Val2 VAL2
Ent6 Ent6	Val3 VAL3	Val1 VAL1	Val1 VAL1
Ent7 Ent7	Val3 VAL3	Val2 VAL2	Val2 VAL2

rawdata <- data.frame(Entity = c('Ent1', 'Ent2', 'Ent3', 'Ent4', 'Ent5', 'Ent6', 'Ent7'),
                      Cat1 = c('Val1', 'Val1', 'Val2', 'Val2', 'Val3', 'Val3', 'Val3'),
                      Cat2 = c('Val1', 'Val1', 'Val2', 'Val1', 'Val1', 'Val1', 'Val2'),
                      Cat3 = c('Val2', 'Val2', 'Val1', 'Val2', 'Val2', 'Val1', 'Val2'))

I want to populate the "Count" column from the summary table with the appropriate counts for each category & value pair.我想用每个类别和值对的适当计数填充汇总表中的“计数”列。 Programmatically, what I would want to do would be to have a counter, go through the Raw Data Table and just update the count for each value.以编程方式，我想做的是通过原始数据表创建一个计数器 go 并更新每个值的计数。 I think this would be very inefficient in R.我认为这在 R 中效率很低。 What I thought I would do is filter for the values but because column names are not evaluated as variables, I am at a loss of how to do this.我想我会做的是过滤值，但是因为列名没有被评估为变量，所以我不知道如何做到这一点。

What I have tried (and I what I think I want something like is):我尝试过的（我认为我想要的是）：

library(dplyr)
summary$Count <- nrow(rawdata %>% filter(get(summary$Category) == get(summary$Value)))

This isn't working, however.但是，这行不通。 How do I get the filter to take values from another table?如何让过滤器从另一个表中获取值？

Answer 1

We can reshape to 'long' format with pivot_longer and use count to get the frequency count我们可以使用pivot_longer重塑为“long”格式并使用count来获取频率计数

library(dplyr)
library(tidyr)
rawdata %>% 
  pivot_longer(cols = -Entity, names_to = "Category", values_to = "Value") %>% 
  count(Category, Value)

-output -输出

# A tibble: 7 x 3
#  Category Value     n
#  <chr>    <chr> <int>
#1 Cat1     Val1      2
#2 Cat1     Val2      2
#3 Cat1     Val3      3
#4 Cat2     Val1      5
#5 Cat2     Val2      2
#6 Cat3     Val1      2
#7 Cat3     Val2      5

NOTE: pivot_longer reshapes the data from the 'wide' format to 'long' format.注意： pivot_longer将数据从“宽”格式重塑为“长”格式。 By specifying the cols = -Entity , it is converting the rest of the columns to 'long' format with the column name as "Category" specified by names_to and the corresponding values as "Value" ( values_to )通过指定cols = -Entity ，它将列的 rest 转换为 'long' 格式，列名称为由names_to指定的“Category”，对应的值为“Value”（ values_to ）

Or using base R with table或使用带有table的base R

subset(as.data.frame(table(data.frame(Category =
   names(rawdata)[-1][col(rawdata[-1])], 
        Value = unlist(rawdata[-1])))), Freq  > 0)

R：一种基于另一个表中的值进行过滤的方法？

问题描述

1 个解决方案

解决方案1
1 2021-01-13 17:23:00

R：一种基于另一个表中的值进行过滤的方法？

问题描述

1 个解决方案

解决方案1 1 2021-01-13 17:23:00

解决方案1
1 2021-01-13 17:23:00