[英]Using compare.linkage from R to compare two data frames and create pairs to input data
I'm trying to input some data in my dataset based in another one. 我正在尝试在另一个数据集中输入数据。 However, to do this I need to compare several variables and each one has a weight.
但是,为此,我需要比较几个变量,每个变量都有权重。 Also, I need to form pairs using the KEY variables, which are the id in my data.
另外,我需要使用KEY变量构成对,这是我数据中的ID。
I was trying to use compare.linkage
, but I couldn't find a way to insert the weights that I want eg 40% to Age
, 40% to CHBORN
, and 20% to URBAN
. 我试图使用
compare.linkage
,但是找不到一种方法来插入我想要的权重,例如40%插入Age
,40% CHBORN
以及20% CHBORN
URBAN
。
Complete <- data.frame(KEY = c(001, 002, 003), AGE = c(35, 38, 45), CHBORN = c(2, 3, 4), URBAN = c(1, 2, 2))
incomplete <- data.frame(KEY = c(004, 005, 006), AGE = c(25, 38, 45), CHBORN = c(1, 2, 4), URBAN = c(2, 1, 1))
KEY_Pairs <- compare.linkage(incomplete, complete, blockfld = c(2, 3, 4), strcmp = TRUE, strcmpfun = levenshteinSim()) #I stopped here
I want to find a result similar to this: 我想找到类似的结果:
KEY_incomplete KEY_complete Scores
004 001 0.95
Usually, I do this using the software FRIL from Emory University, but I'm trying to concentrate everything in R
. 通常,我使用Emory University的FRIL软件进行此操作,但是我试图将所有精力都集中在
R
。
Best, 最好,
Tereza 泰雷扎
软件包为RecordLinkage: https : //cran.r-project.org/web/packages/RecordLinkage/RecordLinkage.pdf
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.