Say I define dissimilarity/similarity of a group as average of the absolute distances of group members from their mean and I have the data in the following format:
ID RecordNumber Other_Record_Similarity
i1 r1 Avg(abs(r2-avg(r2,r3)),abs(r3-avg(r2,r3))
i1 r2 Avg(abs(r1-avg(r1,r3)),abs(r3-avg(r1,r3))
i1 r3 Avg(abs(r1-avg(r1,r2)),abs(r2-avg(r1,r2))
Please suggest me on how to calculate Other_Record_Similarity
in the above table using SQl/SAS/STATA/R.
Thanks!
Here is a generalized way by converting to wide form and using row operations:
library(reshape2)
# create sample data
data <- data.frame(ID = paste0("i", rep(1:5, each = 5)), RecordNumber = paste0("r", rep(1:5, 5)), Value = runif(25))
# convert to wide form
cast_data <- dcast(data, ID ~ RecordNumber, value.var = "Value")
# isolate values
set <- cast_data[, colnames(x) %in% data$RecordNumber]
# calculate similarity
out <- sapply(seq(set), function(k) rowMeans(abs(set[, -k] - rowMeans(set[, -k]))))
# format and convert back to long form
recast_data <- data.frame(cbind(cast_data$ID, out))
colnames(recast_data) <- colnames(cast_data)
final <- melt(recast_data, "ID", value.name = "Similarity", variable.name = "RecordNumber")
> final
ID RecordNumber Similarity
1 1 r1 0.15866019
2 2 r1 0.11273444
3 3 r1 0.35175203
4 4 r1 0.25581895
5 5 r1 0.18711691
6 1 r2 0.17474599
7 2 r2 0.18542584
8 3 r2 0.28154138
9 4 r2 0.24019621
10 5 r2 0.20536817
11 1 r3 0.17782101
12 2 r3 0.16896563
13 3 r3 0.25620738
14 4 r3 0.14478319
15 5 r3 0.05033490
16 1 r4 0.17889032
17 2 r4 0.11219373
18 3 r4 0.24858994
19 4 r4 0.16687316
20 5 r4 0.20259905
21 1 r5 0.05547675
22 2 r5 0.16319309
23 3 r5 0.26891738
24 4 r5 0.22163225
25 5 r5 0.19568286
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.