基本 DNA 编码练习

Question

I recently failed an interview in which I was thrown a Python coding question out of the blue.我最近在一次面试中失败了，我突然被抛出了一个 Python 编码问题。 I'm currently learning Python, and if I came upon the same question again or a similar question I want to be able to answer it.我目前正在学习 Python，如果我再次遇到相同的问题或类似的问题，我希望能够回答它。

The question was as follows:问题如下：

Write a function which takes as its input a string containing the letters: [A, C, G, T];编写一个 function ，它的输入是一个包含以下字母的字符串：[A, C, G, T]； and outputs all the 3-letter subsequences found in the input and the frequency with which they occur.并输出在输入中找到的所有 3 字母子序列以及它们出现的频率。 For example, if the input string was "ACTACTTAC", the output would be something like:例如，如果输入字符串是“ACTACTTAC”，则 output 将类似于：
 ACT: 2 CTA: 1 TAC: 2 CTT: 1 TTA: 1

I came up with some ideas after the fact and I had wondered if a solution like this works, or is there a better way of doing it?事后我想出了一些想法，我想知道这样的解决方案是否有效，或者有更好的方法吗？

def Determine_DNA(dna_list):
    n = len(dna_list[0])
    A = [0]*n
    T = [0]*n
    G = [0]*n
    C = [0]*n
    for dna in dna_list:
        for index, base in enumerate(dna):
            if base == 'A':
                A[index] += 1
            elif base == 'C':
                C[index] += 1
            elif base == 'G':
                G[index] += 1
            elif base == 'T':
                T[index] += 1
    return A, C, G, T

Answer 1

@mousetail mentioned in the comments using collections.Counter . @mousetail 在使用collections.Counter的评论中提到。 Here is an example of that:这是一个例子：

import collections

def dna_freq(dnaseq):
    seq_list = []
    for i in range(2, len(dnaseq)):
        seq_list.append(dnaseq[i-2:i+1])
    return dict(collections.Counter(seq_list))

print(dna_freq("ACTACTTAC"))

{'ACT': 2, 'CTA': 1, 'TAC': 2, 'CTT': 1, 'TTA': 1}

That could be code-golf'd, if hard to read code is your thing:如果难以阅读代码是你的事，那可能是代码高尔夫：

 def dna_freq(dnaseq):
     return dict(collections.Counter([dnaseq[i-2:i+1] for i in range(2, len(dnaseq))]))

Example using zip from the comments, which feels more approachable than list comprehension.使用评论中的zip的示例，感觉比列表理解更平易近人。 It does give a slightly different, but totally usable output.它确实提供了一个略有不同但完全可用的 output。

def dna_freq(dnaseq):
    return dict(collections.Counter(zip(dnaseq, dnaseq[1:], dnaseq[2:])))

Answer 2

This works for your case:这适用于您的情况：

dna = "ACTACTTAC"
LEN = 3
d = set()

for i in range(len(dna)-LEN):
    k = dna[i:i+LEN]
    if not k in d:
        print(f'{k}: {dna.count(k)}')
        d.add(k)

Output: Output：

ACT: 2
CTA: 1
TAC: 2
CTT: 1
TTA: 1

基本 DNA 编码练习

问题描述

2 个解决方案

解决方案1
3 已采纳 2022-07-26 14:50:12

解决方案2
0 2022-07-26 14:17:25

基本 DNA 编码练习

问题描述

2 个解决方案

解决方案1 3 已采纳 2022-07-26 14:50:12

解决方案2 0 2022-07-26 14:17:25

解决方案1
3 已采纳 2022-07-26 14:50:12

解决方案2
0 2022-07-26 14:17:25