[英]In R, how to calculate KL Distance between two vectors of strings?
如果我有两个字符串向量,例如:
> list1 = c("cat", "dog", "cat", "rabbit", "dog", "cat")
> list2 = c("dog", "rabbit", "dog", "mouse", "dog", "rabbit", "cat")
我可以得到每个的分布。 例如:
> dist1 = table(list1)/length(list1)
> dist2 = table(list2)/length(list2)
> dist1; dist2
list1
cat dog rabbit
0.5000000 0.3333333 0.1666667
list2
cat dog mouse rabbit
0.1428571 0.4285714 0.1428571 0.2857143
如何计算这两个分布之间的KL距离? (使用dist2作为基线。)
我见过的KL函数(例如,kl.dist)需要长度相同的向量。
下面将产生一个数据帧,其中每一列都有一个列,每个向量字符串的分布情况:
library(dplyr)
list1 <- c("cat", "dog", "cat", "rabbit", "dog", "cat")
list2 <- c("dog", "rabbit", "dog", "mouse", "dog", "rabbit", "cat")
dist1 <- table(list1)/length(list1)
dist2 <- table(list2)/length(list2)
BothDist <- full_join(as.data.frame(dist1),as.data.frame(dist2), by = c("list1" = "list2"))
BothDist[is.na(BothDist)] <- 0
BothDist
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.