简体   繁体   English

在R中,如何计算两个字符串向量之间的KL距离?

[英]In R, how to calculate KL Distance between two vectors of strings?

If I have 2 vectors of strings like: 如果我有两个字符串向量,例如:

> list1 = c("cat", "dog", "cat", "rabbit", "dog", "cat")
> list2 = c("dog", "rabbit", "dog", "mouse", "dog", "rabbit", "cat")

I can get distributions for each. 我可以得到每个的分布。 For example: 例如:

> dist1 = table(list1)/length(list1)
> dist2 = table(list2)/length(list2)
> dist1; dist2

list1
      cat       dog    rabbit 
0.5000000 0.3333333 0.1666667 
list2
      cat       dog     mouse    rabbit 
0.1428571 0.4285714 0.1428571 0.2857143 

How do I calculate the KL Distance between these two distributions? 如何计算这两个分布之间的KL距离? (Using dist2 as the baseline.) (使用dist2作为基线。)

The KL functions I've seen (eg, kl.dist) require vectors of the same length. 我见过的KL函数(例如,kl.dist)需要长度相同的向量。

The following will produce a data frame with one column with the distribution for each vector strings: 下面将产生一个数据帧,其中每一列都有一个列,每个向量字符串的分布情况:

library(dplyr)

list1 <- c("cat", "dog", "cat", "rabbit", "dog", "cat")
list2 <- c("dog", "rabbit", "dog", "mouse", "dog", "rabbit", "cat")

dist1 <- table(list1)/length(list1)
dist2 <- table(list2)/length(list2)

BothDist <- full_join(as.data.frame(dist1),as.data.frame(dist2), by = c("list1" = "list2")) 
BothDist[is.na(BothDist)] <- 0

BothDist

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM