简体   繁体   中英

Calculating Other records dissimilarity measure in SQl

Say I define dissimilarity/similarity of a group as average of the absolute distances of group members from their mean and I have the data in the following format:

ID  RecordNumber  Other_Record_Similarity
i1  r1            Avg(abs(r2-avg(r2,r3)),abs(r3-avg(r2,r3))
i1  r2            Avg(abs(r1-avg(r1,r3)),abs(r3-avg(r1,r3))
i1  r3            Avg(abs(r1-avg(r1,r2)),abs(r2-avg(r1,r2))

Please suggest me on how to calculate Other_Record_Similarity in the above table using SQl/SAS/STATA/R.

Thanks!

Here is a generalized way by converting to wide form and using row operations:

library(reshape2)

# create sample data
data <- data.frame(ID = paste0("i", rep(1:5, each = 5)), RecordNumber = paste0("r", rep(1:5, 5)), Value = runif(25))

# convert to wide form
cast_data <- dcast(data, ID ~ RecordNumber, value.var = "Value")

# isolate values
set <- cast_data[, colnames(x) %in% data$RecordNumber]

# calculate similarity
out <- sapply(seq(set), function(k) rowMeans(abs(set[, -k] - rowMeans(set[, -k]))))

# format and convert back to long form
recast_data <- data.frame(cbind(cast_data$ID, out))
colnames(recast_data) <- colnames(cast_data)

final <- melt(recast_data, "ID", value.name = "Similarity", variable.name = "RecordNumber")
> final
    ID RecordNumber Similarity
1   1           r1 0.15866019
2   2           r1 0.11273444
3   3           r1 0.35175203
4   4           r1 0.25581895
5   5           r1 0.18711691
6   1           r2 0.17474599
7   2           r2 0.18542584
8   3           r2 0.28154138
9   4           r2 0.24019621
10  5           r2 0.20536817
11  1           r3 0.17782101
12  2           r3 0.16896563
13  3           r3 0.25620738
14  4           r3 0.14478319
15  5           r3 0.05033490
16  1           r4 0.17889032
17  2           r4 0.11219373
18  3           r4 0.24858994
19  4           r4 0.16687316
20  5           r4 0.20259905
21  1           r5 0.05547675
22  2           r5 0.16319309
23  3           r5 0.26891738
24  4           r5 0.22163225
25  5           r5 0.19568286

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM