[英]In R, how to calculate KL Distance between two vectors of strings?
If I have 2 vectors of strings like: 如果我有两个字符串向量,例如:
> list1 = c("cat", "dog", "cat", "rabbit", "dog", "cat")
> list2 = c("dog", "rabbit", "dog", "mouse", "dog", "rabbit", "cat")
I can get distributions for each. 我可以得到每个的分布。 For example: 例如:
> dist1 = table(list1)/length(list1)
> dist2 = table(list2)/length(list2)
> dist1; dist2
list1
cat dog rabbit
0.5000000 0.3333333 0.1666667
list2
cat dog mouse rabbit
0.1428571 0.4285714 0.1428571 0.2857143
How do I calculate the KL Distance between these two distributions? 如何计算这两个分布之间的KL距离? (Using dist2 as the baseline.) (使用dist2作为基线。)
The KL functions I've seen (eg, kl.dist) require vectors of the same length. 我见过的KL函数(例如,kl.dist)需要长度相同的向量。
The following will produce a data frame with one column with the distribution for each vector strings: 下面将产生一个数据帧,其中每一列都有一个列,每个向量字符串的分布情况:
library(dplyr)
list1 <- c("cat", "dog", "cat", "rabbit", "dog", "cat")
list2 <- c("dog", "rabbit", "dog", "mouse", "dog", "rabbit", "cat")
dist1 <- table(list1)/length(list1)
dist2 <- table(list2)/length(list2)
BothDist <- full_join(as.data.frame(dist1),as.data.frame(dist2), by = c("list1" = "list2"))
BothDist[is.na(BothDist)] <- 0
BothDist
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.