简体   繁体   中英

Frequency of bases in a sequence in python

I want to find out the relative amount of each bases in a sequence. The result should be displayed in a list. This is my attempt:

def get_freqs(Sequ):
    rel_Anz=[]
    laenge = len(Sequ)
    A_freq = (Sequ.count('A')/ laenge)
    T_freq = (Sequ.count('T')/ laenge)
    C_freq = (Sequ.count('C')/ laenge)
    G_freq = (Sequ.count('G')/ laenge)

    rel_Anz= [A_freq, T_freq, C_freq, G_freq]
    return rel_Anz

    print("The frequence of each base (A,T,C,G) is ", rel_Anz)

get_freqs (ATTAAACC)

I do not know how I should include the sequence which I want to count. Should I define it before?

Firstly, assuming you want to pass a sequence of nucleotides to the function, you problably want to pass it as a string, so it looks like this:

get_freqs ('ATTAAACC')

or like this

get_freqs ("ATTAAACC")

Secondly, you return before printing the result:

return rel_Anz
print("The frequence of each base (A,T,C,G) is ", rel_Anz)

Every statement in the function after the return won't be executed, so it should be:

print("The frequence of each base (A,T,C,G) is ", rel_Anz)
return rel_Anz

Finally, something like this should work:

def get_freqs(Sequ):
    rel_Anz=[]
    laenge = len(Sequ)
    A_freq = (Sequ.count('A')/ laenge)
    T_freq = (Sequ.count('T')/ laenge)
    C_freq = (Sequ.count('C')/ laenge)
    G_freq = (Sequ.count('G')/ laenge)

    rel_Anz= [A_freq, T_freq, C_freq, G_freq]
    print("The frequence of each base (A,T,C,G) is ", rel_Anz)
    return rel_Anz

get_freqs ('ATTAAACC')

If you want it to be more clear and pythonic:

def get_freqs(seq):
    length = len(seq)

    a_freq = seq.count('A')/ length 
    t_freq = seq.count('T')/ length 
    c_freq = seq.count('C')/ length 
    g_freq = seq.count('G')/ length 

    return [a_freq, t_freq, c_freq, g_freq]

relative_frequencies = get_freqs ('ATTAAACC')
print("The frequence of each base (A,T,C,G) is ", relative_frequencies)

or even more dense:

def get_freqs(seq):
    return [seq.count(nucl)/len(seq) for nucl in 'ATCG']

relative_frequencies = get_freqs('ATTAAACC')
print("The frequence of each base (A,T,C,G) is ", relative_frequencies)

I strongly suggest to use Counter() !

from collections import Counter

sequence = 'ATCGACTAGCATCGACTACATCACTAC'

c = Counter(sequence)
print(c)


l = len(sequence)
for k,v in c.items():
    print(f'{k} frequency is: {v/l}')

Output:

Counter({'A': 9, 'C': 9, 'T': 6, 'G': 3})
A frequency is: 0.3333333333333333
T frequency is: 0.2222222222222222
C frequency is: 0.3333333333333333
G frequency is: 0.1111111111111111

Wrapping it into a function:

def get_freq(sequence):
    c = Counter(sequence.upper())
    l = len(sequence)
    result = {}
    for k,v in c.items():
        result.update({k: round(v/l, 2)})
    return result

get_freq('ATTAAACC')

{'A': 0.5, 'T': 0.25, 'C': 0.25}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM