简体   繁体   中英

Is there a way to get next value for the same key in the next dictionary in the list of dictionaries?

I have two different lists of dictionaries, list_a and list_b . First list of dictionaries holds nucleotide sites and the other list of dictionaries has start and end coordinates of all genes. If the site falls into the range of the gene coordinates than that site belongs to that gene. However sometimes even though a site is outside the range it is still belongs to that gene. Eg the site from list_a , second dictionary - 8 belongs to the gene_b .

list_a = [{'Ch': 'I', 'name': 'test_1', 'site': 2}, {'Ch': 'II', 'name': 'test_2', 'site': 8}, {'Ch': 'II', 'name': 'test_3', 'site': 10}]

list_b = [{'Ch': 'I', 'name': 'gene_a', 'start': 1, 'end': 3}, {'Ch': 'II', 'name': 'gene_b', 'start': 3, 'end': 6}]   

Here is first part that work fine.

for item_a in list_a:
    for item_b in list_b:
        if item_a['Ch'] == item_b['Ch'] and item_a['site'] >= item_b['start'] and item_a['site'] <= item_b['end']:
            print item_b['name'], item_a['site']

So I want to have some thing like this

if item_a['site'] >= item_b['start'] and item_a['site'] >= item_b['end']
and item_a['site'] <= the next site in the next dictionary in list_a... 
or the beginning of the next gene in the next dictionary... ???

(i have figured out how to order list of dictionaries by keys)

I tried to utilise the next() function, but couldn't get it to work.

The more efficient method would be to parse out the sections into a structure per Ch value, in sorted order:

from collections import defaultdict
import bisect

ranges = defaultdict(list)
for info in list_b:
    bisect.insort(ranges[info['Ch']], (info['start'], info['end'], info['name']))

The bisect.insort() call inserts new entries in sorted order into the list, saving you another sorting loop.

Now use this to home in on the ranges given a list_a Ch value:

for gene in list_a:
    for start, stop, name in ranges[gene['Ch']]:
        if start <= gene['site'] <= stop:
            print name, gene['site']
            break

This still doesn't search for a next match against the 'stop' parameter, of course, but the latter loop can be folded into a generator expression, suitable for use in the next() function, and because the ranges are sorted, you can continue the search for the next site name:

for gene in list_a:
    site = gene['site']
    range = iter(ranges[gene['Ch']])
    # skip anything with start > site
    name = previous = next((name for start, stop, name in range if start <= site), None)

    # search on for a matching stop, looking ahead. If we find a stop < site
    # the previous entry matched. If we ran of the end of our options, the last
    # entry matched.
    for start, stop, name  in range:
        if site > stop:
            previous = name
            continue
        if start > site:
            name = previous
        break

    print name, site

The range iterable 'remembers' where the first next() search stopped, we can loop over it to continue searching for a suitable stop value in from that point.

Note that presumably the stop values are always going to be equal or larger than the start values; there is no point in testing against the next item start values as well; if site <= stop is True then site <= start is also True .

I think you might be able to do something more straightforward.

In list_b you could add a new key called site: which you could set to (start+end)/2.

Then merge list_a and list_b, and sort them by the key (Ch:, site:) in sorted_list.

Then go though sorted_list one at a time. If it's a gene (from list_a) skip it and keep track of its name: and if it's a site (from list_b) set its name to the previous item's name: or use the name: you saved.

There may be some tweaking of "what is closest" to do, but i'm sure you can do it with looking ahead and behind your current position and doing some appropriate business logic.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM