简体   繁体   English

Python3替换使用字典

[英]Python3 replace using dictionary

Could anyone please explain what is wrong here: 谁能解释一下这里有什么问题:

def get_complementary_sequence(string):
    dic = {'A':'T', 'C':'G', 'T':'A', 'G':'C'}
    for a, b in dic.items():
        string = string.replace(a, b)
    return string

I get proper results for 'T' and 'C', but 'A' and 'C' won't replace. 我得到'T'和'C'的正确结果,但'A'和'C'不会取代。 Got really stuck. 真的卡住了。

String looks like 'ACGTACG'. 字符串看起来像'ACGTACG'。

You are first replacing all A s with T s before then replacing all T s with A s again (including those you just replaced A s with!): 您首先用T s替换所有A s,然后再用A s替换所有T s(包括那些你用A替换A s!):

>>> string = 'ACGTACG'
>>> string.replace('A', 'T')
'TCGTTCG'
>>> string.replace('A', 'T').replace('T', 'A')
'ACGAACG'

Use a translation map instead, fed to str.translate() : 请使用翻译地图,输入str.translate()

transmap = {ord('A'): 'T', ord('C'): 'G', ord('T'): 'A', ord('G'): 'C'}
return string.translate(transmap)

The str.translate() method requires a dictionary mapping codepoints (integers) to replacement characters (either a single character or a codepoint), or None (to delete the codepoint from the input string). str.translate()方法需要将代码点(整数)映射到替换字符(单个字符或代码点)或None (从输入字符串中删除代码点)。 The ord() function gives us those codepoints for the given 'from' letters. ord()函数为给定的'from'字母提供了那些代码点。

This looks up characters in string , one by one in C code, in the translation map, instead of replacing all A s followed by all T s. 这在string查找字符,在C代码中逐个查找,而不是替换所有A s,后跟所有T s。

str.translate() has the added advantage of being much faster than a series of str.replace() calls. str.translate()具有比一系列的快很多额外的好处str.replace()调用。

Demo: 演示:

>>> string = 'ACGTACG'
>>> transmap = {ord('A'): 'T', ord('C'): 'G', ord('T'): 'A', ord('G'): 'C'}
>>> string.translate(transmap)
'TGCATGC'

Mutable data is your enemy :) 可变数据是你的敌人:)

See, you first replace all A s with T s, then , in another iteration, replace all T s with A s again. 请注意,首先用T s替换所有A s, 然后在另一次迭代中,再次用A s替换所有T s。

What works: 什么有效:

# for Creek and Watson's sake, name your variables sensibly
complements = {ord('A'):'T', ord('C'):'G', ord('T'):'A', ord('G'):'C'}
sequence = "AGCTTCAG"
print(sequence.translate(complements))

It prints TCGAAGTC . 它打印TCGAAGTC

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM