[英]Basic DNA Coding Exercise
I recently failed an interview in which I was thrown a Python coding question out of the blue.我最近在一次面试中失败了,我突然被抛出了一个 Python 编码问题。 I'm currently learning Python, and if I came upon the same question again or a similar question I want to be able to answer it.
我目前正在学习 Python,如果我再次遇到相同的问题或类似的问题,我希望能够回答它。
The question was as follows:问题如下:
Write a function which takes as its input a string containing the letters: [A, C, G, T];
编写一个 function ,它的输入是一个包含以下字母的字符串:[A, C, G, T]; and outputs all the 3-letter subsequences found in the input and the frequency with which they occur.
并输出在输入中找到的所有 3 字母子序列以及它们出现的频率。 For example, if the input string was "ACTACTTAC", the output would be something like:
例如,如果输入字符串是“ACTACTTAC”,则 output 将类似于:
ACT: 2 CTA: 1 TAC: 2 CTT: 1 TTA: 1
I came up with some ideas after the fact and I had wondered if a solution like this works, or is there a better way of doing it?事后我想出了一些想法,我想知道这样的解决方案是否有效,或者有更好的方法吗?
def Determine_DNA(dna_list):
n = len(dna_list[0])
A = [0]*n
T = [0]*n
G = [0]*n
C = [0]*n
for dna in dna_list:
for index, base in enumerate(dna):
if base == 'A':
A[index] += 1
elif base == 'C':
C[index] += 1
elif base == 'G':
G[index] += 1
elif base == 'T':
T[index] += 1
return A, C, G, T
@mousetail mentioned in the comments using collections.Counter
. @mousetail 在使用
collections.Counter
的评论中提到。 Here is an example of that:这是一个例子:
import collections
def dna_freq(dnaseq):
seq_list = []
for i in range(2, len(dnaseq)):
seq_list.append(dnaseq[i-2:i+1])
return dict(collections.Counter(seq_list))
print(dna_freq("ACTACTTAC"))
{'ACT': 2, 'CTA': 1, 'TAC': 2, 'CTT': 1, 'TTA': 1}
That could be code-golf'd, if hard to read code is your thing:如果难以阅读代码是你的事,那可能是代码高尔夫:
def dna_freq(dnaseq):
return dict(collections.Counter([dnaseq[i-2:i+1] for i in range(2, len(dnaseq))]))
Example using zip
from the comments, which feels more approachable than list comprehension.使用评论中的
zip
的示例,感觉比列表理解更平易近人。 It does give a slightly different, but totally usable output.它确实提供了一个略有不同但完全可用的 output。
def dna_freq(dnaseq):
return dict(collections.Counter(zip(dnaseq, dnaseq[1:], dnaseq[2:])))
This works for your case:这适用于您的情况:
dna = "ACTACTTAC"
LEN = 3
d = set()
for i in range(len(dna)-LEN):
k = dna[i:i+LEN]
if not k in d:
print(f'{k}: {dna.count(k)}')
d.add(k)
Output: Output:
ACT: 2
CTA: 1
TAC: 2
CTT: 1
TTA: 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.