简体   繁体   English

在 Python 中,除了使用 str.translate() 之外,替换给定字符串中某些字符的最快方法是什么?

[英]In Python, what is the fastest way to replace certain characters in a given string other than using str.translate()?

What is the fastest way to replace certain characters in a given string other than using str.translate() ?除了使用str.translate()之外,替换给定字符串中某些字符的最快方法是什么?

Given a sequence that only consists of letters "A", "T", "G", and "C", I want to replace each instance of "A" with "T", "T" with "A", "C" with "G", and "G" with "C".给定一个仅由字母“A”、“T”、“G”和“C”组成的sequence ,我想用“T”替换“A”的每个实例,用“A”、“C”替换“T” ”与“G”,“G”与“C”。 To do this, I used an ascii dictionary map = {65:84,84:65,71:67,67:71} , and do sequence.translate(map) .为此,我使用了 ascii 字典map = {65:84,84:65,71:67,67:71} ,并执行sequence.translate(map) However, in Python 3.8 this appears to be slow.但是,在Python 3.8中,这似乎很慢。 I saw people mention using byte or bytearray to do this, but I just don't know how to make it work.我看到有人提到使用bytebytearray来做到这一点,但我只是不知道如何让它工作。

It looks like I first need to encode the sequence using sequence.encode('ascii', 'ignore') and then use translate() to do the translation?看起来我首先需要使用sequence.encode('ascii', 'ignore')对序列进行编码,然后使用translate()进行翻译?

Can anybody please help me?有人可以帮我吗?

For example,例如,

sequence = 'ATGCGTGCGCGACTTT'
# {'A':'T', 'T':'A', 'C':'G', 'G':'C'}
map_dict = {65:84,84:65,71:67,67:71}
# expect 'TACGCACGCGCTGAAA'
sequence.translate(map_dict)

Assumption here is the sequence is very long, then this should be O(1):假设这里的序列很长,那么这应该是 O(1):

If you maintain an index which contains the position of each letter in the sequence, then you just need to update the index to do bulk replacements.如果您维护一个包含序列中每个字母位置的索引,那么您只需要更新索引即可进行批量替换。

For example given seq = "AGCTTCGA"例如给定 seq = "AGCTTCGA"

index = {"A": {0, 7}, "G": {1, 6}, "C": {2, 5}, "T": {3, 4}}

and if I understand correctly you want to do a swap:如果我理解正确,您想进行交换:

def swap(index, charA, charB):
    tmp = index[charB]
    index[charB] = index[charA]
    index[charA] = tmp

swap(index, "A", "T")
print(index)
# {'A': {3, 4}, 'G': {1, 6}, 'C': {2, 5}, 'T': {0, 7}}

I am going to assume that you want to just replace any occurrence of a string with another.我将假设您只想用另一个字符串替换任何出现的字符串。 Replace will not work in this case thank you for pointing this out but use:在这种情况下,替换将不起作用感谢您指出这一点,但请使用:

for i in string:
   match i:
      case "A": i="T"
      case "T": i="A"
      case "C": i="G"
      case "G": i="C"
   continue

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM