简体   繁体   English

比较R中同一列中的两个变量

[英]comparing two variables in the same column in R

I have two columns. 我有两列。 one has a list of variables eg "cat", "dog", "rat", "chicken" and the other is whether the pet shop was visited on the first or second trip. 一个具有变量列表,例如“猫”,“狗”,“老鼠”,“鸡”,另一个是在第一次或第二次旅行中是否访问过宠物店。

visit_number    pet
      1         dog
      2         dog
      1         cat
      2         cat
      1         rat
      2         chicken

I am looking to get compare the differences between the two visits in R eg intersect() and setdiff() . 我希望比较R的两次访问之间的差异,例如intersect()setdiff() Basically exactly the same as this question: 基本上与这个问题完全相同:

Compare two lists in R 比较R中的两个列表

However, I don't have two lists but have two variables in a single column and I cant seem to get the code to work. 但是,我没有两个列表,但在一个列中有两个变量,我似乎无法让代码工作。

what I am trying to achieve is a function like this but that uses the single column instead rather than the two lists (code taken from the other question): 我想要实现的是这样的函数但是使用单列而不是两个列表(从另一个问题中获取的代码):

xtab_set <- function(A,B){
    both    <-  union(A,B)
    inA     <-  both %in% A
    inB     <-  both %in% B
    return(table(inA,inB))
}

Frankly speaking, the output matrix is not very clear. 坦率地说,输出矩阵不是很清楚。 However, you mentioned at the comment that you are “looking for the number (count) of unique individual animals per visit that occurred only in visit one, only in visit two and occurred i both visits.” Also in the document you provided there are three visits. 但是,您在评论中提到,您“正在寻找每次访问的唯一个体动物的数量(计数),仅在访问一次时发生,仅在访问二次,并且在我访问时都发生过。”同样在您提供的文档中三次访问。 I am considering three visits. 我正在考虑三次访问。

The following code will show the number of unique individual animals by visits as well as number of unique individual animals that appeared in all visits. 以下代码将显示通过访问获得的独特个体动物的数量以及在所有访问中出现的独特个体动物的数量。

Step 1 . 第1步 Build a raw dataset 构建原始数据集

library(data.table)
df = data.table(visit_number = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 3), 
                pet = c("Dog", "Rat", "Cat", "Dog", "Chicken", "Cat", "Dog", "Cat", "Fish", "Horse"))

Step 2 . 第2步 Create a vector of understandable column names for future reference 创建可理解列名的向量以供将来参考

cols = c(paste0(rep("Visit", length(unique(df$visit_number))), unique(df$visit_number)))

Step 3 . 第3步 Create a matrix of pets appearance 创建一个宠物外观矩阵

df = dcast.data.table(df, pet ~ visit_number, value.var = "pet", fun.aggregate = length)
names(df)[-1] = cols # assign understandable column names

Step 4 . 第4步 Define pets that appeared in all visits 定义所有访问中出现的宠物

df[, AllVisits := Reduce(`*`, .SD), .SDcols = cols]

It gives: 它给:

df
       pet Visit1 Visit2 Visit3 AllVisits
1:     Cat      1      1      1         1
2: Chicken      0      1      0         0
3:     Dog      1      1      1         1
4:    Fish      0      0      1         0
5:   Horse      0      0      1         0
6:     Rat      1      0      0         0

Rat was unique for Visit 1, Chicken was unique for Visit 2, Fish and Horse were unique for Visit 3. Cat and Dog appeared in all visits. 对于第1次访问,大鼠是独一无二的,对于第2次访问,鸡是独一无二的,鱼和马在访问3中是独一无二的。所有访问都出现了猫和狗。

Step 5 . 第5步 Get the number of unique number of animals by visits and unique number of animals that appeared in all visits 通过访问获得独特动物数量,并在所有访问中出现独特的动物数量

idx = df[, Reduce(`+`, .SD) == 1, .SDcols = cols]
unlist(c(df[idx, lapply(.SD, function(x) sum(x)), .SDcols = cols], AllVisits = df[, sum(AllVisits)]))

The result is: 结果是:

Visit1    Visit2    Visit3 AllVisits 
     1         1         2         2 

Let me know if that is what you are looking for. 如果您正在寻找,请告诉我。

PS The code will require modification if pets may appear several times during the visit. PS如果宠物在访问期间可能出现多次,则需要修改代码。

If I understood what you are asking correctly, here is a solution using functions from the dplyr package: 如果我理解你正确的问题,这里有一个使用dplyr包中的函数的解决方案:

full_join(filter(df, visit_number == 1), filter(df, visit_number == 2), by = 'pet') %>%
    mutate(visit1 = !is.na(visit_number.x),
           visit2 = !is.na(visit_number.y),
           both = visit1 & visit2) %>% 
    select(-starts_with('visit_number'))

Giving: 赠送:

      pet visit1 visit2  both
1     dog   TRUE   TRUE  TRUE
2     cat   TRUE   TRUE  TRUE
3     rat   TRUE  FALSE FALSE
4 chicken  FALSE   TRUE FALSE

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM