简体   繁体   English

使用python计算字母在特定位置出现的次数

[英]Counting the number of times a letter occurs at a certain position using python

I'm a python beginner and I've come across this problem and I'm not sure how I'd go about tackling it. 我是python的初学者,遇到过这个问题,我不确定如何解决。

If I have the following sequence/strings: 如果我有以下顺序/字符串:

GATCCG GATCCG

GTACGC GTACGC

How to I count the frequency each letter occurs at each position. 如何计算每个字母在每个位置出现的频率。 ie) G occurs at position one twice in the two sequences, A occurs at position 1 zero times etc. 也就是说,G在两个序列中两次出现在位置1处,A在位置1发生了0次,依此类推。

Any help would be appreciated, thank you! 任何帮助,将不胜感激,谢谢!

You can use a combination of defaultdict and enumerate like so: 您可以使用defaultdictenumerate的组合,如下所示:

from  collections import defaultdict

sequences = ['GATCCG', 'GTACGC']
d = defaultdict(lambda: defaultdict(int))  # d[char][position] = count
for seq in sequences:
    for i, char in enumerate(seq):  # enum('abc'): [(0,'a'),(1,'b'),(2,'c')]
        d[char][i] += 1

d['C'][3]  # 2
d['C'][4]  # 1
d['C'][5]  # 1

This builds a nested defaultdict that takes the character as first and the position as second key and provides the count of occurrences of said character in said position. 这将建立一个嵌套的defaultdict ,它将字符作为第一键,并将位置作为第二个键,并提供在所述位置出现所述字符的次数。

If you want lists of position-counts: 如果您想要仓位列表:

max_len = max(map(len, sequences))
d = defaultdict(lambda: [0]*max_len)  # d[char] = [pos0, pos12, ...]
for seq in sequences:
    for i, char in enumerate(seq): 
        d[char][i] += 1

d['G']  # [2, 0, 0, 0, 1, 1]

Not sure this is the best way but you can use zip to do a sort of transpose on the the strings, producing tuples of the letters in each position, eg: 不确定这是最好的方法,但是您可以使用zip在字符串上进行某种转置,从而在每个位置生成字母的元组,例如:

x = 'GATCCG' 
y = 'GTACGC'

zipped = zip(x,y)

print zipped

will produce as output: 将产生作为输出:

[('G', 'G'), ('A', 'T'), ('T', 'A'), ('C', 'C'), ('C', 'G'), ('G', 'C')]

You can see from the tuples that the first positions of the two strings contain two Gs, the second positions contain an A and a T, etc. Then you could use Counter (or some other method) to get at what you want. 从元组中可以看到,两个字符串的第一个位置包含两个G,第二个位置包含A和T,依此类推。然后可以使用Counter(或其他方法)获得所需的值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 计算项目在OrderedDict中某个位置出现的次数 - Counting the number of times an item occurs in a certain position within an OrderedDict 统计某个position在列表中显示的次数 - Counting the number of times a certain position is shown in a list 通过计算在python中groupby之后的列中出现0次的子集 - subset by counting the number of times 0 occurs in a column after groupby in python 如何使用 python 计算 json 文件中某个单词出现的次数? - How to count number of times a certain word occurs in json file using python? 递归计算对象在字符串中出现的次数 - recursively counting number of times an object occurs in a string 如何编写一个 Python function 来计算一个字母在字符串中出现的次数? - How can I write a Python function that counts the number of times a letter occurs in a string? 使用递归Python计算项目在序列中出现的次数 - Count the number of times an item occurs in a sequence using recursion Python 字母“a”和“t”以大写和小写出现的次数 - Number of times letter “a” and “t” occurs in upper and lower cases 如何使用python计算字母在单词中出现的次数 - How count the number of times a letter appears in word using python 使用Python计算连续重复一个值的次数 - Counting the number of times a value is repeated in a row using Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM