简体   繁体   中英

How would I calculate the distances between multiple cells in R?

I have multiple different phenotypes and xy coordinates for each cell. What would be the easiest way to calculate distances between each of my cells within the same slide? My dataset has 100,000+ cells so I'm trying to figure out the most efficient way to do this.

An example dataframe would be:

Xposition <- c(1,6,4,7,9,4,8,6,4)

Yposition <- c(6,3,2,6,3,6,1,3,7)

Phenotype <- c("A", "A", "B", "C", "C", "A", "A", "B", "B")

SlideID <- c(111,111,111,111,111,112,112,112,112)

df <- data.frame(Xposition, Yposition, Phenotype, SlideID)

I'm looking for something that could give me a dataframe where the outputs are something like:

CellType1 <- c("A", "A", "A", "A", "A", "A", "A", "B", "B", "C", "A", "A", "A", "A", "A", "B")

Celltype2 <- c("A", "B", "C", "C", "B", "C", "C", "C", "C", "C", "A", "B", "B", "B", "B", "B")

Distance <- c("5.83", "5", "6", "8.54", "2.23", "3.16", "3", "5", "5.09", "3.6", "6.4", "3.6", "1", "2.82", "7.21", "4.47")

SlideID <- c("111", "111", "111", "111", "111", "111", "111", "111", "111", "111", "112", "112", "112", "112", "112", "112")

distancedf <- data.frame(CellType1, Celltype2, Distance, SlideID)

Thanks for your help!

I think there is room for ambiguity here, but...

res <- as.data.frame.table(as.matrix(dist(df[,1:2])))
res$Var2 <- df$Phenotype[res$Var2]
res$SlideID <- df$SlideID[res$Var1]
res$Var1 <- df$Phenotype[res$Var1]
head(res)
#   Var1 Var2     Freq SlideID
# 1    A    A 0.000000     111
# 2    A    A 5.830952     111
# 3    B    A 5.000000     111
# 4    C    A 6.000000     111
# 5    C    A 8.544004     111
# 6    A    A 3.000000     112

From this, you should be able to filter out the 0 s fairly easily, but I wanted to keep it here to show what is actually happening. Effectively, that as.data.frame.table(...) is going from this

dist(df[,1:2])
#          1        2        3        4        5        6        7        8
# 2 5.830952                                                               
# 3 5.000000 2.236068                                                      
# 4 6.000000 3.162278 5.000000                                             
# 5 8.544004 3.000000 5.099020 3.605551                                    
# 6 3.000000 3.605551 4.000000 3.000000 5.830952                           
# 7 8.602325 2.828427 4.123106 5.099020 2.236068 6.403124                  
# 8 5.830952 0.000000 2.236068 3.162278 3.000000 3.605551 2.828427         
# 9 3.162278 4.472136 5.000000 3.162278 6.403124 1.000000 7.211103 4.472136

through this:

as.matrix(dist(df[,1:2]))
#          1        2        3        4        5        6        7        8        9
# 1 0.000000 5.830952 5.000000 6.000000 8.544004 3.000000 8.602325 5.830952 3.162278
# 2 5.830952 0.000000 2.236068 3.162278 3.000000 3.605551 2.828427 0.000000 4.472136
# 3 5.000000 2.236068 0.000000 5.000000 5.099020 4.000000 4.123106 2.236068 5.000000
# 4 6.000000 3.162278 5.000000 0.000000 3.605551 3.000000 5.099020 3.162278 3.162278
# 5 8.544004 3.000000 5.099020 3.605551 0.000000 5.830952 2.236068 3.000000 6.403124
# 6 3.000000 3.605551 4.000000 3.000000 5.830952 0.000000 6.403124 3.605551 1.000000
# 7 8.602325 2.828427 4.123106 5.099020 2.236068 6.403124 0.000000 2.828427 7.211103
# 8 5.830952 0.000000 2.236068 3.162278 3.000000 3.605551 2.828427 0.000000 4.472136
# 9 3.162278 4.472136 5.000000 3.162278 6.403124 1.000000 7.211103 4.472136 0.000000

ultimately to this

head(as.data.frame.table(as.matrix(dist(df[,1:2]))))
#   Var1 Var2     Freq
# 1    1    1 0.000000
# 2    2    1 5.830952
# 3    3    1 5.000000
# 4    4    1 6.000000
# 5    5    1 8.544004
# 6    6    1 3.000000

and the 0.000 s are the diagonals of the distance matrix (that are masked in the default representation of dist(...) ).


Per SlideID :

lapply(split(df, df$SlideID), function(x) { 
  res <- as.data.frame.table(as.matrix(dist(x[,1:2])))
  res$Var2 <- x$Phenotype[res$Var2]
  res$SlideID <- x$SlideID[res$Var1]
  res$Var1 <- x$Phenotype[res$Var1]
  res
})
# $`111`
#    Var1 Var2     Freq SlideID
# 1     A    A 0.000000     111
# 2     A    A 5.830952     111
# 3     B    A 5.000000     111
# 4     C    A 6.000000     111
# 5     C    A 8.544004     111
# 6     A    A 5.830952     111
# 7     A    A 0.000000     111
# 8     B    A 2.236068     111
# 9     C    A 3.162278     111
# 10    C    A 3.000000     111
# 11    A    B 5.000000     111
# 12    A    B 2.236068     111
# 13    B    B 0.000000     111
# 14    C    B 5.000000     111
# 15    C    B 5.099020     111
# 16    A    C 6.000000     111
# 17    A    C 3.162278     111
# 18    B    C 5.000000     111
# 19    C    C 0.000000     111
# 20    C    C 3.605551     111
# 21    A    C 8.544004     111
# 22    A    C 3.000000     111
# 23    B    C 5.099020     111
# 24    C    C 3.605551     111
# 25    C    C 0.000000     111
# $`112`
#    Var1 Var2     Freq SlideID
# 1     A    A 0.000000     112
# 2     A    A 6.403124     112
# 3     B    A 3.605551     112
# 4     B    A 1.000000     112
# 5     A    A 6.403124     112
# 6     A    A 0.000000     112
# 7     B    A 2.828427     112
# 8     B    A 7.211103     112
# 9     A    B 3.605551     112
# 10    A    B 2.828427     112
# 11    B    B 0.000000     112
# 12    B    B 4.472136     112
# 13    A    B 1.000000     112
# 14    A    B 7.211103     112
# 15    B    B 4.472136     112
# 16    B    B 0.000000     112

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM