简体   繁体   English

R:一种基于另一个表中的值进行过滤的方法?

[英]R: A way to filter based on values in another table?

I have two tables that have the following structures.我有两个具有以下结构的表。 Table 1, which I will call the Summary Table, is a list of category-values with a count:表 1,我将其称为汇总表,是一个包含计数的类别值列表:

Category类别 Value价值 Count数数
Cat1第一类 Val1 VAL1
Cat1第一类 Val2 VAL2
Cat1第一类 Val3 VAL3
Cat2类别 2 Val1 VAL1
Cat2类别 2 Val2 VAL2
Cat3三类 Val1 VAL1
Cat3三类 Val2 VAL2
summary <- data.frame(Category = c('Cat1', 'Cat1', 'Cat1', 'Cat2', 'Cat2', 'Cat3', 'Cat3'),
                      Value = c('Val1', 'Val2', 'Val3', 'Val1', 'Val2', 'Val1', 'Val2'),
                      Count = c(NA,NA,NA,NA,NA,NA,NA))

I want to populate this table with counts gathered from Table 2, which we will call Raw Data Table, which has the following structure:我想用从表 2 收集的计数填充此表,我们将其称为原始数据表,它具有以下结构:

Entity实体 Cat1第一类 Cat2类别 2 Cat3三类
Ent1 Ent1 Val1 VAL1 Val1 VAL1 Val2 VAL2
Ent2 Ent2 Val1 VAL1 Val1 VAL1 Val2 VAL2
Ent3 Ent3 Val2 VAL2 Val2 VAL2 Val1 VAL1
Ent4 Ent4 Val2 VAL2 Val1 VAL1 Val2 VAL2
Ent5 Ent5 Val3 VAL3 Val1 VAL1 Val2 VAL2
Ent6 Ent6 Val3 VAL3 Val1 VAL1 Val1 VAL1
Ent7 Ent7 Val3 VAL3 Val2 VAL2 Val2 VAL2
rawdata <- data.frame(Entity = c('Ent1', 'Ent2', 'Ent3', 'Ent4', 'Ent5', 'Ent6', 'Ent7'),
                      Cat1 = c('Val1', 'Val1', 'Val2', 'Val2', 'Val3', 'Val3', 'Val3'),
                      Cat2 = c('Val1', 'Val1', 'Val2', 'Val1', 'Val1', 'Val1', 'Val2'),
                      Cat3 = c('Val2', 'Val2', 'Val1', 'Val2', 'Val2', 'Val1', 'Val2'))

I want to populate the "Count" column from the summary table with the appropriate counts for each category & value pair.我想用每个类别和值对的适当计数填充汇总表中的“计数”列。 Programmatically, what I would want to do would be to have a counter, go through the Raw Data Table and just update the count for each value.以编程方式,我想做的是通过原始数据表创建一个计数器 go 并更新每个值的计数。 I think this would be very inefficient in R.我认为这在 R 中效率很低。 What I thought I would do is filter for the values but because column names are not evaluated as variables, I am at a loss of how to do this.我想我会做的是过滤值,但是因为列名没有被评估为变量,所以我不知道如何做到这一点。

What I have tried (and I what I think I want something like is):我尝试过的(我认为我想要的是):

library(dplyr)
summary$Count <- nrow(rawdata %>% filter(get(summary$Category) == get(summary$Value)))

This isn't working, however.但是,这行不通。 How do I get the filter to take values from another table?如何让过滤器从另一个表中获取值?

We can reshape to 'long' format with pivot_longer and use count to get the frequency count我们可以使用pivot_longer重塑为“long”格式并使用count来获取频率计数

library(dplyr)
library(tidyr)
rawdata %>% 
  pivot_longer(cols = -Entity, names_to = "Category", values_to = "Value") %>% 
  count(Category, Value)

-output -输出

# A tibble: 7 x 3
#  Category Value     n
#  <chr>    <chr> <int>
#1 Cat1     Val1      2
#2 Cat1     Val2      2
#3 Cat1     Val3      3
#4 Cat2     Val1      5
#5 Cat2     Val2      2
#6 Cat3     Val1      2
#7 Cat3     Val2      5

NOTE: pivot_longer reshapes the data from the 'wide' format to 'long' format.注意: pivot_longer将数据从“宽”格式重塑为“长”格式。 By specifying the cols = -Entity , it is converting the rest of the columns to 'long' format with the column name as "Category" specified by names_to and the corresponding values as "Value" ( values_to )通过指定cols = -Entity ,它将列的 rest 转换为 'long' 格式,列名称为由names_to指定的“Category”,对应的值为“Value”( values_to


Or using base R with table或使用带有tablebase R

subset(as.data.frame(table(data.frame(Category =
   names(rawdata)[-1][col(rawdata[-1])], 
        Value = unlist(rawdata[-1])))), Freq  > 0)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM