[英]Counting the number of times a letter occurs at a certain position using python
I'm a python beginner and I've come across this problem and I'm not sure how I'd go about tackling it. 我是python的初学者,遇到过这个问题,我不确定如何解决。
If I have the following sequence/strings: 如果我有以下顺序/字符串:
GATCCG GATCCG
GTACGC GTACGC
How to I count the frequency each letter occurs at each position. 如何计算每个字母在每个位置出现的频率。 ie) G occurs at position one twice in the two sequences, A occurs at position 1 zero times etc.
也就是说,G在两个序列中两次出现在位置1处,A在位置1发生了0次,依此类推。
Any help would be appreciated, thank you! 任何帮助,将不胜感激,谢谢!
You can use a combination of defaultdict
and enumerate
like so: 您可以使用
defaultdict
和enumerate
的组合,如下所示:
from collections import defaultdict
sequences = ['GATCCG', 'GTACGC']
d = defaultdict(lambda: defaultdict(int)) # d[char][position] = count
for seq in sequences:
for i, char in enumerate(seq): # enum('abc'): [(0,'a'),(1,'b'),(2,'c')]
d[char][i] += 1
d['C'][3] # 2
d['C'][4] # 1
d['C'][5] # 1
This builds a nested defaultdict
that takes the character as first and the position as second key and provides the count of occurrences of said character in said position. 这将建立一个嵌套的
defaultdict
,它将字符作为第一键,并将位置作为第二个键,并提供在所述位置出现所述字符的次数。
If you want lists of position-counts: 如果您想要仓位列表:
max_len = max(map(len, sequences))
d = defaultdict(lambda: [0]*max_len) # d[char] = [pos0, pos12, ...]
for seq in sequences:
for i, char in enumerate(seq):
d[char][i] += 1
d['G'] # [2, 0, 0, 0, 1, 1]
Not sure this is the best way but you can use zip to do a sort of transpose on the the strings, producing tuples of the letters in each position, eg: 不确定这是最好的方法,但是您可以使用zip在字符串上进行某种转置,从而在每个位置生成字母的元组,例如:
x = 'GATCCG'
y = 'GTACGC'
zipped = zip(x,y)
print zipped
will produce as output: 将产生作为输出:
[('G', 'G'), ('A', 'T'), ('T', 'A'), ('C', 'C'), ('C', 'G'), ('G', 'C')]
You can see from the tuples that the first positions of the two strings contain two Gs, the second positions contain an A and a T, etc. Then you could use Counter (or some other method) to get at what you want. 从元组中可以看到,两个字符串的第一个位置包含两个G,第二个位置包含A和T,依此类推。然后可以使用Counter(或其他方法)获得所需的值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.