简体   繁体   中英

Rosalind: Mendel's first law

I'm trying to solve the problem at http://rosalind.info/problems/iprb/

Given : Three positive integers k , m , and n , representing a population containing k+m+n organisms: k individuals are homozygous dominant for a factor, m are heterozygous, and n are homozygous recessive.

Return : The probability that two randomly selected mating organisms will produce an individual possessing a dominant allele (and thus displaying the dominant phenotype). Assume that any two organisms can mate.

My solution works for the sample, but not for any problems generated. After further research it seems that I should find the probability of choosing any one organism at random, find the probability of choosing the second organism, and then the probability of that pairing producing offspring with a dominant allele.

My question is then: what does my code below find the probability of? Does it find the percentage of offspring with a dominant allele for all possible matings -- so rather than the probability of one random mating, my code is solving for the percentage of offspring with dominant alleles if all pairs were tested?

f = open('rosalind_iprb.txt', 'r')
r = f.read()
s = r.split()
############# k = # homozygotes dominant, m = #heterozygotes, n = # homozygotes recessive
k = float(s[0])
m = float(s[1])
n = float(s[2])
############# Counts for pairing between each group and within groups
k_k = 0
k_m = 0
k_n = 0

m_m = 0
m_n = 0

n_n = 0


##############
if k > 1:
    k_k = 1.0 + (k-2) * 2.0

k_m = k * m

k_n = k * n

if m > 1:
    m_m = 1.0 + (m-2) * 2.0

m_n = m * n

if n> 1:
    n_n = 1.0 + (n-2) * 2.0
#################
dom = k_k + k_m + k_n + 0.75*m_m + 0.5*m_n
total = k_k + k_m + k_n + m_m + m_n + n_n

chance = dom/total
print chance

Looking at your code, I'm having a hard time figuring out what it's supposed to do. I'll work through the problem here.

Let's simplify the wording. There are n1 type 1, n2 type 2, and n3 type 3 items.

How many ways are there to choose a set of size 2 out of all the items? (n1 + n2 + n3) choose 2.

Every pair of items will have item types corresponding to one of the six following unordered multisets: {1,1}, {2,2}, {3,3}, {1,2}, {1,3}, {2,3}

How many multisets of the form {i,i} are there? ni choose 2.

How many multisets of the form {i,j} are there, where i != j? ni * nj.

The probabilities of the six multisets are thus the following:

  • P({1,1}) = [n1 choose 2] / [(n1 + n2 + n3) choose 2]
  • P({2,2}) = [n2 choose 2] / [(n1 + n2 + n3) choose 2]
  • P({3,3}) = [n3 choose 2] / [(n1 + n2 + n3) choose 2]
  • P({1,2}) = [n1 * n2] / [(n1 + n2 + n3) choose 2]
  • P({1,3}) = [n1 * n3] / [(n1 + n2 + n3) choose 2]
  • P({2,3}) = [n2 * n3] / [(n1 + n2 + n3) choose 2]

These sum to 1. Note that [X choose 2] is just [X * (X - 1) / 2] for X > 1 and 0 for X = 0 or 1.

Return: The probability that two randomly selected mating organisms will produce an individual possessing a dominant allele (and thus displaying the dominant phenotype).

To answer this question, you simply need to identify which of the six multisets correspond to this event. Lacking the genetics knowledge to answer that question, I'll leave that to you.

For example, suppose that a dominant allele results if either of the two parents was type 1. Then the events of interest are {1,1}, {1,2}, {1,3} and the probability of the event is P({1,1}) + P({1,2}) + P({1,3}).

If you are interested, I just found a solution and put it in C#.

    public double mendel(double k, double m, double n)
    {
        double prob;

        prob = ((k*k - k) + 2*(k*m) + 2*(k*n) + (.75*(m*m - m)) + 2*(.5*m*n))/((k + m + n)*(k + m + n -1)); 

        return prob;
    }

Our parameters are k (dominant), m (hetero), & n (recessive). First I found the probability for each possible breeding pair selection in terms of percentage of the population. So, a first round choice for k would look like k/(k+m+n), and a second round choice of k after a first round choice of k would look like (k-1)/(k+m+n). Then multiply these two to get the outcome. Since there were three identified populations, there were nine possible outcomes.

Then I multiplied each outcome by it's dominance probability - 100% for anything with k, 75% for m&m, 50% for m&n, n&m, and 0% for n&n. Now add the outcomes together, and you have your solution.

http://rosalind.info/problems/iprb/

I spend some time in this question, so, to clarify in python:

lst = ['2', '2', '2']
k, m, n = map(float, lst)
t = sum(map(float, lst))
# organize a list with allele one * allele two (possibles) * dominant probability
# multiplications by one were ignored
# remember to substract the haplotype from the total when they're the same for the second haplotype choosed
couples = [
            k*(k-1),  # AA x AA
            k*m,  # AA x Aa
            k*n,  # AA x aa
            m*k,  # Aa x AA
            m*(m-1)*0.75,  # Aa x Aa
            m*n*0.5,  # Aa x aa
            n*k,  # aa x AA
            n*m*0.5,  # aa x Aa
            n*(n-1)*0  # aa x aa
]
# (t-1) indicate that the first haplotype was select
print(round(sum(couples)/t/(t-1), 5))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM