计算向量中序列之间的差异，用于 R 中的距离矩阵

Question

Hi all I am trying to create a distance matrix from a random created sequence.大家好，我正在尝试从随机创建的序列中创建一个距离矩阵。 #set the code #设置代码

    DNA <- c("A","G","T","C")
    randomDNA <- c()

#create the vector of 64 elements #创建64个元素的向量

    for (i in 1:64){
      randomDNA[i] <- paste0(sample(DNA, 6, replace = T), sep = "", collapse = "")
      warnings()
    }
    sizeofDNA <- length(randomDNA)

#this part that I want to iterate between vector's components #这部分我想在向量的组件之间迭代

    split_vector <- c()
    DNAdiff <- c()
    for (i in 1:length(randomDNA)){
      split_vector <- strsplit(randomDNA[i], "")[[1]]
      #print(split_vector)
      for (j in 1:length(randomDNA)){
      split_vector2 <- strsplit(randomDNA[j], "")[[1]]
      #print(split_vector2)
      DNAdiff[i,j] <- setdiff(split_vector,split_vector2)
      #or
      #DNAdiff[i] <- lenght(setdiff(strsplit(randomDNA[22], "")[[1]],strsplit(randomDNA[33], "")[[1]]))
      }
    }

What it does not work is A: the setdiff does not work as I expect B: no array is created它不起作用的是 A：setdiff 不像我预期的那样工作 B：没有创建数组

Question how do I export the results of the setdiff (if it will work) to an array so that I will have the distance matrix like array?问题我如何将 setdiff 的结果（如果它可以工作）导出到一个数组，以便我将拥有像数组一样的距离矩阵？ Any recommendation is highly welcomed.任何建议都受到高度欢迎。 Thank you all谢谢你们

EDIT: So there are 2 solutions:编辑：所以有两种解决方案：

A. Using, as mentioned in the comments by @ThomasIsCoding, the "adist" function; A.如@ThomasIsCoding 的评论中所述，使用“adist”function； this will calculate the Levenshtein distances:这将计算 Levenshtein 距离：

    DNA <- c("A","G","T","C")
    randomDNA <- c()
    
    for (i in 1:64){
      randomDNA[i] <- paste0(sample(DNA, 6, replace = T), sep = "", collapse = "")
    }
    
    dm <-as.matrix(adist(randomDNA))
    
    rownames(dm) <- randomDNA
    colnames(dm) <- randomDNA
    
    pdf("heatmap.pdf")
    heatmap(dm, Rowv = NA, Colv = NA)
    dev.off()
    write.csv(dm,"distance_matrix.csv", row.names   = T, col.names  = T )

B. Another method to calculate the Hamming distance will be: B.计算汉明距离的另一种方法是：

DNA <- c("A","G","T","C")
randomDNA <- c()

for (i in 1:96){
  randomDNA[i] <- paste0(sample(DNA, 6, replace = T), sep = "", collapse = "")
}

Humm <- matrix(nrow=length(randomDNA), ncol=length(randomDNA))
for (i in 1:length(randomDNA)){
  split_vector <- strsplit(randomDNA[i], "")[[1]]
  for (j in 1:length(randomDNA)){
    split_vector2 <- strsplit(randomDNA[j], "")[[1]]
    #Hamming distance is calculated as:
    Humm[i,j] <- sum(split_vector != split_vector2)
  }
}

rownames(Humm) <- randomDNA
colnames(Humm) <- randomDNA
pdf("heatmap.pdf")
heatmap(Humm, Rowv = NA, Colv = NA)
dev.off()
write.csv(Humm,"distance_matrix.csv", row.names = T, col.names  = T )

Answer 1

I think you you might need adist to get the distance matrix, eg,我认为您可能需要adist来获取距离矩阵，例如，

adist(randomDNA)

计算向量中序列之间的差异，用于 R 中的距离矩阵

问题描述

1 个解决方案

解决方案1
1 2021-05-03 19:01:50

计算向量中序列之间的差异，用于 R 中的距离矩阵

问题描述

1 个解决方案

解决方案1 1 2021-05-03 19:01:50

解决方案1
1 2021-05-03 19:01:50