R 數據表唯一記錄計數基於 2 列中給定值列表的所有組合

Question

我在 R 中有一個data.table如下

Col1          Col2         
Col1Value1    Col2Value1   
Col1Value1    Col2Value2
Col1Value1    Col2Value3
Col1Value2    Col2Value1   
Col1Value2    Col2Value3
Col1Value3    Col2Value1
Col1Value3    Col2Value2
Col1Value3    Col2Value3

我想獲取Col1 - (Col1Value1,Col1Value2)中的給定值與Col2 - Col1(Col2Value1,Col2Value2) Col2Value2) 中的值之間的每個組合的記錄數，如果組合沒有記錄，則返回 0

counts <- dt[, length(unique(Col2)), by=.(Col1, Col2)]

上面的代碼返回所有組合，但是

- A combination with 0 records are not returned
- Not able to restrict to a given list

預期結果

Col1           Col2        Count
Col1Value1     Col2Value1    1
Col1Value1     Col2Value2    1
Col1Value2     Col2Value1    1
Col1Value2     Col2Value2    0

Answer 1

DT[CJ(Col1, Col2, unique = TRUE), on = .(Col1, Col2), .(count = .N), by = .EACHI]

#          Col1       Col2 count
# 1: Col1Value1 Col2Value1     1
# 2: Col1Value1 Col2Value2     1
# 3: Col1Value1 Col2Value3     1
# 4: Col1Value2 Col2Value1     1
# 5: Col1Value2 Col2Value2     0
# 6: Col1Value2 Col2Value3     1
# 7: Col1Value3 Col2Value1     1
# 8: Col1Value3 Col2Value2     1
# 9: Col1Value3 Col2Value3     1

數據

DT <- fread(
  "Col1          Col2         
  Col1Value1    Col2Value1   
  Col1Value1    Col2Value2
  Col1Value1    Col2Value3
  Col1Value2    Col2Value1   
  Col1Value2    Col2Value3
  Col1Value3    Col2Value1
  Col1Value3    Col2Value2
  Col1Value3    Col2Value3"
)

如果你想限制組合，那么你可以像 Harshal 那樣使用dplyr預先過濾：

a <- c("Col1Value1", "Col1Value2")
b <- c("Col2Value1", "Col2Value2")
DT[Col1 %in% a & Col2 %in% b
   ][CJ(Col1, Col2, unique = TRUE), on = .(Col1, Col2), .(count = .N), by = .EACHI]

Answer 2

在基礎 R 中，您可以執行以下操作：

data.frame(table(dt))

        Var1       Var2 Freq
1 Col1Value1 Col2Value1    1
2 Col1Value2 Col2Value1    1
3 Col1Value3 Col2Value1    1
4 Col1Value1 Col2Value2    1
5 Col1Value2 Col2Value2    0
6 Col1Value3 Col2Value2    1
7 Col1Value1 Col2Value3    1
8 Col1Value2 Col2Value3    1
9 Col1Value3 Col2Value3    1

Answer 3

您可以像這樣使用table ：

data.table(with(dt, table(Col1, Col2)))

         Col1       Col2 N
1: Col1Value1 Col2Value1 1
2: Col1Value2 Col2Value1 1
3: Col1Value3 Col2Value1 1
4: Col1Value1 Col2Value2 1
5: Col1Value2 Col2Value2 0
6: Col1Value3 Col2Value2 1
7: Col1Value1 Col2Value3 1
8: Col1Value2 Col2Value3 1
9: Col1Value3 Col2Value3 1

數據

dt <- setDT(read.table(text="Col1          Col2         
                 Col1Value1    Col2Value1   
                 Col1Value1    Col2Value2
                 Col1Value1    Col2Value3
                 Col1Value2    Col2Value1   
                 Col1Value2    Col2Value3
                 Col1Value3    Col2Value1
                 Col1Value3    Col2Value2
                 Col1Value3    Col2Value3", header=TRUE,stringsAsFactors=FALSE) )

Answer 4

你可以試試下面的代碼：

a<-c("Col1Value1", "Col1Value2")
b<-c("Col2Value1", "Col2Value2")

df2<-df %>% select(Col1, Col2) %>% filter(Col1 %in% a) %>% filter(Col2 %in% b)  %>% group_by(Col1, Col2) %>% summarise(count = n()) %>% as.data.frame()

expand.grid(a,b) %>% left_join(df2, by = c("Var1"="Col1", "Var2"="Col2")) %>% mutate(count2 = ifelse(is.na(count), 0, count)) %>% select(-count)

下面是 output：

        Var1       Var2 count2
1 Col1Value1 Col2Value1      1
2 Col1Value2 Col2Value1      1
3 Col1Value1 Col2Value2      1
4 Col1Value2 Col2Value2      0

R 數據表唯一記錄計數基於 2 列中給定值列表的所有組合

問題描述

4 個解決方案

解決方案1
4 2020-05-12 13:23:07

解決方案2
3 已采納 2020-05-12 13:26:19

解決方案3
1 2020-05-12 13:26:35

解決方案4
0 2020-05-12 12:56:44

R 數據表唯一記錄計數基於 2 列中給定值列表的所有組合

問題描述

4 個解決方案

解決方案1 4 2020-05-12 13:23:07

解決方案2 3 已采納 2020-05-12 13:26:19

解決方案3 1 2020-05-12 13:26:35

解決方案4 0 2020-05-12 12:56:44

解決方案1
4 2020-05-12 13:23:07

解決方案2
3 已采納 2020-05-12 13:26:19

解決方案3
1 2020-05-12 13:26:35

解決方案4
0 2020-05-12 12:56:44