使用for循环替换DNA序列中给定位置的DNA核苷酸

Question

In the R data frame, I am trying to replace mutation column DNA nucleotide into WT.seq using position column numbers.在 R 数据框中，我正在尝试使用位置列号将突变列 DNA 核苷酸替换为 WT.seq。

Following is my data frame以下是我的数据框

    transcript  position    ref mutation    type    WT.seq
1   trx1    5   A   G   substitution    ATAAAA
2   trx2    3   C   A   substitution    CCCCCC
3   trx3    7   T   C   substitution    AAAAAATGG

Expected output in the data frame数据框中的预期输出

    transcript  position    ref mutation    type    WT.seq
1   trx1    5   A   G   substitution    ATAAGA
2   trx2    3   C   A   substitution    CCACCC
3   trx3    7   T   C   substitution    AAAAAACGG

Explanation解释

for example, in the given output data frame WT.seq column contains DNA sequences, and in the first row of WT.seq there is DNA sequence ATAAAA is present and I have to replace mutation column DNA nucleotide G(mutation column,1st row) at 5th position of ATAAAA , after replacing G at 5th position in this sequence it will be ATAAGA .例如，在给定的输出数据帧WT.seq column中包含 DNA 序列，并且在 WT.seq 的第一行中存在 DNA 序列ATAAAA并且我必须替换突变列 DNA 核苷酸G(mutation column,1st row)在5th position of ATAAAA G at 5th position后，它将是ATAAGA 。 This position number is given from the position column,1st row .这个位置编号是从position column,1st row给出的。 I have to do this for all rows in the data frame.我必须对数据框中的所有行执行此操作。 My data frame contains thousands of rows.我的数据框包含数千行。

In the above output,i have done it for the first row using the following code.在上面的输出中，我使用以下代码为第一行完成了它。

DNA_seq <- read.table("sequences.txt",sep = "\t",header = T)

df<- as.data.frame(DNA_seq)

substring(df[1,6], first=df[1,2]) <- df[1,4]

I want to run for loop on the remaining rows so that all mutation nucleotide replacement will be done in WT.seq column with help of position column numbers我想在剩余的行上运行 for 循环，以便在位置列号的帮助下在 WT.seq 列中完成所有突变核苷酸替换

Answer 1

You could strsplit , replace position with mutation in Map and paste back together.您可以strsplit ， replace位置替换为Map中的突变并重新paste在一起。

transform(dat, WT.mut=Map(replace, strsplit(WT.seq, ''), position, mutation) |>
  sapply(paste, collapse=''))
#   transcript position ref mutation         type    WT.seq    WT.mut
# 1       trx1        5   A        G substitution    ATAAAA    ATAAGA
# 2       trx2        3   C        A substitution    CCCCCC    CCACCC
# 3       trx3        7   T        C substitution AAAAAATGG AAAAAACGG

I used an extra column to demonstrate, just replace WT.mut= with WT.seq= to overwrite.我使用了一个额外的列来演示，只需将WT.mut=替换为WT.seq=即可覆盖。

使用for循环替换DNA序列中给定位置的DNA核苷酸

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-07-11 08:13:06

使用for循环替换DNA序列中给定位置的DNA核苷酸

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-07-11 08:13:06

解决方案1
1 已采纳 2022-07-11 08:13:06