简体   繁体   中英

Compute and plot pairwise distances using dist in R

I have a dataframe with 4 columns.

set.seed(123)
df <- data.frame(A = round(rnorm(1000, mean = 1)),
           B = rpois(1000, lambda = 3),
           C = round(rnorm(1000, mean = -1)),
           D = round(rnorm(1000, mean = 0)))

I would like to compute the distances for every possible combination of my columns (AB, AC, AD, BC, BD, CD) at every row of my dataframe. This would be the equivalent of doing df$A - df$B for every combination.

Can we use the dist() function to compute this efficiently as I have a very large dataset? I would like to then convert the dist object into a data.frame to plot the results with ggplot2 . Unless there is a good tidy version of doing the above.

Many Thanks

The closest I got was doing the below, but I am not sure to what the column names refer to.

d <- apply(as.matrix(df), 1, function(e) as.vector(dist(e)))
t(d)

dist will compare every value in a vector to every other value in the same vector, so if you are looking to compare columns row-by-row, this is not what you are looking for.

If you just want to calculate the difference between all columns pairwise, you can do:

df <- cbind(df, 
            do.call(cbind, lapply(asplit(combn(names(df), 2), 2), function(x) {
  setNames(data.frame(df[x[1]] - df[x[2]]), paste(x, collapse = ""))
})))

head(df)
#>   A B  C  D AB AC AD BC BD CD
#> 1 0 1 -2 -1 -1  2  1  3  2 -1
#> 2 1 1 -1  1  0  2  0  2  0 -2
#> 3 3 1 -2 -1  2  5  4  3  2 -1
#> 4 1 3  0 -1 -2  1  2  3  4  1
#> 5 1 3  0  1 -2  1  0  3  2 -1
#> 6 3 3  1  0  0  2  3  2  3  1

Created on 2022-06-14 by the reprex package (v2.0.1)

Using base r:

df_dist <- t(apply(df, 1, dist))
colnames(df_dist) <- apply(combn(names(df), 2), 2, paste0, collapse = "_")

If you really want to use a tidy-approach, you could go with c_across , but this also removes the names, and is much slower if your data is huge

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM