简体   繁体   English

从字母组合生成字符串的最佳方式是python还是bash?

[英]What is the best, python or bash for generating strings from combinations of letters?

I need to generate the strings STA and STB. 我需要生成字符串STA和STB。

STA and STB are strings of length 10, and each one can contain only the characters A,T,G or C. STA和STB是长度为10的字符串,每个字符串只能包含字符A,T,G或C。

I have to generate all possible combinations of STA, and depending on STA, I generate STB. 我必须生成STA的所有可能组合,并且根据STA生成STB。

The ways is that the character A is always associated with T and viceversa and G with C and viceversa. 方式是,字符A始终与T关联,反之亦然,字符G与C关联,反之亦然。

so it is possible combinations like: 因此可能是类似的组合:

STA: ATGC...
STB: TACG...

or 要么

STA: GTTA...
STB: CAAT...

and so on. 等等。

I wonder what would be the best way of doing this using bash or python 我想知道使用bash或python的最佳方法是什么

Thanks 谢谢

I'd say Python. 我会说Python。

Have a look here for string permutations: Permutations using a Combinations Generator (Python) . 在这里查看字符串排列: 使用组合生成器(Python)进行排列 Another thing to look at is itertools in Python 2.6 + - Generating all permutations of a list in python . 要看的另一件事是Python 2.6 +中的itertools在python中生成列表的所有排列 I do note however that your requirements are more in depth, however you will probably find it easier to add in the necessary constraints in Python rather than Bash. 但是,我确实注意到您的需求更加深入,但是您可能会发现在Python中添加必要的约束比Bash更加容易。

Simple, clean and easy. 简单,干净,容易。

Now, I'm not expert on Bash, but looking at it, you would have to have multiple lines that repeat pretty much the same text over and over depending on your combinations. 现在,我不是Bash方面的专家,但是从它的角度来看,您将不得不有多行根据您的组合一遍又一遍重复几乎相同的文本。 It would be great to use simple combinations, but not linked combinations. 使用简单的组合而不是链接的组合会很好。

While I don't know bash and don't see how permutations would solve your problem, it seems that itertools.product is a fairly straightforward way to do this: 虽然我不了解bash,也看不到permutations如何解决您的问题,但itertools.product似乎是一种相当简单的方法:

>>> s = 'atgc'
>>> d = dict(zip(s, 'tacg'))
>>> import itertools
>>> for i in itertools.product(s, repeat=10):
    sta = ''.join(i)
    stb = ''.join(d[x] for x in i)

while proposed method is valid in terms of obtaining all possible permutations with replacement of the 'atgc' string, ie, finding sta string, finding stb would be more efficient not through the dictionary look-up, but rather the translation mechanism: 尽管所提出的方法在获得所有可能的置换(替换'atgc'字符串)方面是有效的,即查找sta字符串,而不是通过字典查找,而是通过翻译机制,查找stb会更有效:

>>> trans = str.maketrans(s, 'tacg')
>>> for i in itertools.product(s, repeat=10):
    sta = ''.join(i)
    stb = sta.translate(trans)

Thanks to Dave, for highlighting more efficient solution. 感谢Dave,他着重介绍了更有效的解决方案。

Others have said how to generate STA. 其他人已经说过如何生成STA。

The most efficient way to convert a string STA into the equivalent string STB is to use the string translate & maketrans functions. 将字符串STA转换为等效字符串STB的最有效方法是使用字符串translationmaketrans函数。

>>> import string
>>> s = "AGTC" * 100
>>> trans = string.maketrans("ATGC", "TACG")
>>> s.translate(trans)
'TCAG...TCAG'

On my system this is ~100 times faster than doing a dictionary lookup on each character as suggested by SilentGhost. 在我的系统上,这比SilentGhost所建议的对每个字符进行字典查找要快100倍。

Here you go: 干得好:

>>> from itertools import product
>>> seq = ("AGCT",) * 10
>>> STA = [''.join(a) for a in product(*seq)]
>>> STB = list(reversed(STA))

Incidentally, len(STA) is 2 20 . len(STA)是2 20

itertools.product is available in Python 2.6. itertools.product在Python 2.6中可用。

See @hop's answer here for an implementation of product in Python 2.5 有关Python 2.5中product的实现,请参见此处的 @hop答案。

bash baby :) bash宝贝:)

STA=$(echo {A,C,T,G}{A,C,T,G}{A,C,T,G}{A,C,T,G}{A,C,T,G}{A,C,T,G}{A,C,T,G}{A,C,T,G}{A,C,T,G}{A,C,T,G})
STB=$(echo $STA | tr ATCG TAGC)

echo $STA
echo $STB

与您的实际问题无关,但与您(显然)在做什么相关,您是否签出了BioPython

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM