[英]R data table unique record count based on all combination of a given list of values from 2 columns
I have a data.table
in R as below我在 R 中有一个
data.table
如下
Col1 Col2
Col1Value1 Col2Value1
Col1Value1 Col2Value2
Col1Value1 Col2Value3
Col1Value2 Col2Value1
Col1Value2 Col2Value3
Col1Value3 Col2Value1
Col1Value3 Col2Value2
Col1Value3 Col2Value3
I want to get the count of records for each combination between given values in Col1 - (Col1Value1,Col1Value2)
against values in Col2 - Col1(Col2Value1,Col2Value2)
and if no records for a combination then return 0我想获取
Col1 - (Col1Value1,Col1Value2)
中的给定值与Col2 - Col1(Col2Value1,Col2Value2)
Col2Value2) 中的值之间的每个组合的记录数,如果组合没有记录,则返回 0
counts <- dt[, length(unique(Col2)), by=.(Col1, Col2)]
The above code returns all combinations, but上面的代码返回所有组合,但是
- A combination with 0 records are not returned
- Not able to restrict to a given list
Expected result预期结果
Col1 Col2 Count
Col1Value1 Col2Value1 1
Col1Value1 Col2Value2 1
Col1Value2 Col2Value1 1
Col1Value2 Col2Value2 0
DT[CJ(Col1, Col2, unique = TRUE), on = .(Col1, Col2), .(count = .N), by = .EACHI]
# Col1 Col2 count
# 1: Col1Value1 Col2Value1 1
# 2: Col1Value1 Col2Value2 1
# 3: Col1Value1 Col2Value3 1
# 4: Col1Value2 Col2Value1 1
# 5: Col1Value2 Col2Value2 0
# 6: Col1Value2 Col2Value3 1
# 7: Col1Value3 Col2Value1 1
# 8: Col1Value3 Col2Value2 1
# 9: Col1Value3 Col2Value3 1
Data数据
DT <- fread(
"Col1 Col2
Col1Value1 Col2Value1
Col1Value1 Col2Value2
Col1Value1 Col2Value3
Col1Value2 Col2Value1
Col1Value2 Col2Value3
Col1Value3 Col2Value1
Col1Value3 Col2Value2
Col1Value3 Col2Value3"
)
If you want to limit combinations then you could filter beforehand as Harshal did using dplyr
:如果你想限制组合,那么你可以像 Harshal 那样使用
dplyr
预先过滤:
a <- c("Col1Value1", "Col1Value2")
b <- c("Col2Value1", "Col2Value2")
DT[Col1 %in% a & Col2 %in% b
][CJ(Col1, Col2, unique = TRUE), on = .(Col1, Col2), .(count = .N), by = .EACHI]
In base R, you can do:在基础 R 中,您可以执行以下操作:
data.frame(table(dt))
Var1 Var2 Freq
1 Col1Value1 Col2Value1 1
2 Col1Value2 Col2Value1 1
3 Col1Value3 Col2Value1 1
4 Col1Value1 Col2Value2 1
5 Col1Value2 Col2Value2 0
6 Col1Value3 Col2Value2 1
7 Col1Value1 Col2Value3 1
8 Col1Value2 Col2Value3 1
9 Col1Value3 Col2Value3 1
You can use table
like so:您可以像这样使用
table
:
data.table(with(dt, table(Col1, Col2)))
Col1 Col2 N
1: Col1Value1 Col2Value1 1
2: Col1Value2 Col2Value1 1
3: Col1Value3 Col2Value1 1
4: Col1Value1 Col2Value2 1
5: Col1Value2 Col2Value2 0
6: Col1Value3 Col2Value2 1
7: Col1Value1 Col2Value3 1
8: Col1Value2 Col2Value3 1
9: Col1Value3 Col2Value3 1
DATA数据
dt <- setDT(read.table(text="Col1 Col2
Col1Value1 Col2Value1
Col1Value1 Col2Value2
Col1Value1 Col2Value3
Col1Value2 Col2Value1
Col1Value2 Col2Value3
Col1Value3 Col2Value1
Col1Value3 Col2Value2
Col1Value3 Col2Value3", header=TRUE,stringsAsFactors=FALSE) )
You can try below code:你可以试试下面的代码:
a<-c("Col1Value1", "Col1Value2")
b<-c("Col2Value1", "Col2Value2")
df2<-df %>% select(Col1, Col2) %>% filter(Col1 %in% a) %>% filter(Col2 %in% b) %>% group_by(Col1, Col2) %>% summarise(count = n()) %>% as.data.frame()
expand.grid(a,b) %>% left_join(df2, by = c("Var1"="Col1", "Var2"="Col2")) %>% mutate(count2 = ifelse(is.na(count), 0, count)) %>% select(-count)
Below is the output:下面是 output:
Var1 Var2 count2
1 Col1Value1 Col2Value1 1
2 Col1Value2 Col2Value1 1
3 Col1Value1 Col2Value2 1
4 Col1Value2 Col2Value2 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.