简体   繁体   English

计算向量中序列之间的差异,用于 R 中的距离矩阵

[英]Calculate differences between sequences in vector, for distance matrix in R

Hi all I am trying to create a distance matrix from a random created sequence.大家好,我正在尝试从随机创建的序列中创建一个距离矩阵。 #set the code #设置代码

    DNA <- c("A","G","T","C")
    randomDNA <- c()

#create the vector of 64 elements #创建64个元素的向量

    for (i in 1:64){
      randomDNA[i] <- paste0(sample(DNA, 6, replace = T), sep = "", collapse = "")
      warnings()
    }
    sizeofDNA <- length(randomDNA)

#this part that I want to iterate between vector's components #这部分我想在向量的组件之间迭代

    split_vector <- c()
    DNAdiff <- c()
    for (i in 1:length(randomDNA)){
      split_vector <- strsplit(randomDNA[i], "")[[1]]
      #print(split_vector)
      for (j in 1:length(randomDNA)){
      split_vector2 <- strsplit(randomDNA[j], "")[[1]]
      #print(split_vector2)
      DNAdiff[i,j] <- setdiff(split_vector,split_vector2)
      #or
      #DNAdiff[i] <- lenght(setdiff(strsplit(randomDNA[22], "")[[1]],strsplit(randomDNA[33], "")[[1]]))
      }
    }

What it does not work is A: the setdiff does not work as I expect B: no array is created它不起作用的是 A:setdiff 不像我预期的那样工作 B:没有创建数组

Question how do I export the results of the setdiff (if it will work) to an array so that I will have the distance matrix like array?问题我如何将 setdiff 的结果(如果它可以工作)导出到一个数组,以便我将拥有像数组一样的距离矩阵? Any recommendation is highly welcomed.任何建议都受到高度欢迎。 Thank you all谢谢你们

EDIT: So there are 2 solutions:编辑:所以有两种解决方案:

A. Using, as mentioned in the comments by @ThomasIsCoding, the "adist" function; A.如@ThomasIsCoding 的评论中所述,使用“adist”function; this will calculate the Levenshtein distances:这将计算 Levenshtein 距离:

    DNA <- c("A","G","T","C")
    randomDNA <- c()
    
    for (i in 1:64){
      randomDNA[i] <- paste0(sample(DNA, 6, replace = T), sep = "", collapse = "")
    }
    
    dm <-as.matrix(adist(randomDNA))
    
    rownames(dm) <- randomDNA
    colnames(dm) <- randomDNA
    
    pdf("heatmap.pdf")
    heatmap(dm, Rowv = NA, Colv = NA)
    dev.off()
    write.csv(dm,"distance_matrix.csv", row.names   = T, col.names  = T )

B. Another method to calculate the Hamming distance will be: B.计算汉明距离的另一种方法是:

DNA <- c("A","G","T","C")
randomDNA <- c()

for (i in 1:96){
  randomDNA[i] <- paste0(sample(DNA, 6, replace = T), sep = "", collapse = "")
}

Humm <- matrix(nrow=length(randomDNA), ncol=length(randomDNA))
for (i in 1:length(randomDNA)){
  split_vector <- strsplit(randomDNA[i], "")[[1]]
  for (j in 1:length(randomDNA)){
    split_vector2 <- strsplit(randomDNA[j], "")[[1]]
    #Hamming distance is calculated as:
    Humm[i,j] <- sum(split_vector != split_vector2)
  }
}

rownames(Humm) <- randomDNA
colnames(Humm) <- randomDNA
pdf("heatmap.pdf")
heatmap(Humm, Rowv = NA, Colv = NA)
dev.off()
write.csv(Humm,"distance_matrix.csv", row.names = T, col.names  = T )

I think you you might need adist to get the distance matrix, eg,我认为您可能需要adist来获取距离矩阵,例如,

adist(randomDNA)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM