简体   繁体   中英

Create multiple dictionaries from a single iterator in nested for loops

I have a nested list comprehension which has created a list of six lists of ~29,000 items. I'm trying to parse this list of final data, and create six separate dictionaries from it. Right now the code is very unpythonic, I need the right statement to properly accomplish the following:

1.) Create six dictionaries from a single statement.

2.) Scale to any length list, ie, not hardcoding a counter shown as is.

I've run into multiple issues, and have tried the following:

1.) Using while loops

2.) Using break statements, will break out of the inner most loop, but then does not properly create other dictionaries. Also break statements set by a binary switch.

3.) if, else conditions for n number of indices, indices iterate from 1-29,000, then repeat.

Note the ellipses designate code omitted for brevity.

# Parse csv files for samples, creating a dictionary of key, value pairs and multiple lists.
with open('genes_1') as f:
    cread_1 = list(csv.reader(f, delimiter = '\t'))
    sample_1_values = [j for i, j in (sorted([x for x in {i: float(j) 
                        for i, j in cread_1}.items()], key = lambda v: v[1]))]
    sample_1_genes = [i for i, j in (sorted([x for x in {i: float(j) 
                            for i, j in cread_1}.items()], key = lambda v: v[1]))]

... 

# Compute row means.
mean_values = []
for i, (a, b, c, d, e, f) in enumerate(zip(sample_1_values, sample_2_values, sample_3_values, sample_4_values, sample_5_values, sample_6_values)):
    mean_values.append((a + b + c + d + e + f)/6)

# Provide proper gene names for mean values and replace original data values by corresponding means.
sample_genes_list = [i for i in sample_1_genes, sample_2_genes, sample_3_genes, sample_4_genes, sample_5_genes, sample_6_genes]

sample_final_list = [sorted(zip(sg, mean_values)) for sg in sample_genes_list]

# Create multiple dictionaries from normalized values for each dataset.
class BreakIt(Exception): pass
try: 
    count = 1         
    for index, items in enumerate(sample_final_list):
        sample_1_dict_normalized = {}             
        for index, (genes, values) in enumerate(items):
            sample_1_dict_normalized[genes] = values
            count = count + 1
            if count == 29595:
                raise BreakIt
except BreakIt:
    pass

...

try: 
    count = 1         
    for index, items in enumerate(sample_final_list):
        sample_6_dict_normalized = {}             
        for index, (genes, values) in enumerate(items):
            if count > 147975:
                sample_6_dict_normalized[genes] = values
            count = count + 1
            if count == 177570:
                raise BreakIt
except BreakIt:
    pass

# Pull expression values to qualify overexpressed proteins.
print 'ERG values:'
print 'Sample 1:', round(sample_1_dict_normalized.get('ERG'), 3) 
print 'Sample 6:', round(sample_6_dict_normalized.get('ERG'), 3)  

Your code is too long for me to give exact answer. I will answer very generally.

First, you are using enumerate for no reason. if you don't need both index and value, you probably don't need enumerate.

This part:

with open('genes.csv') as f:
    cread_1 = list(csv.reader(f, delimiter = '\t'))
    sample_1_dict = {i: float(j) for i, j in cread_1}
    sample_1_list = [x for x in sample_1_dict.items()]
    sample_1_values_sorted = sorted(sample_1_list, key=lambda expvalues: expvalues[1])
    sample_1_genes = [i for i, j in sample_1_values_sorted]
    sample_1_values = [j for i, j in sample_1_values_sorted]
    sample_1_graph_raw = [float(j) for i, j in cread_1] 

should be (a) using a list named samples and (b) much shorter, since you don't really need to extract all this information from sample_1_dict and move it around right now. It can be something like:

samples = [None] * 6
for k in range(6):
    with open('genes.csv') as f: #but something specific to k
        cread = list(csv.reader(f, delimiter = '\t'))
        samples[k] = {i: float(j) for i, j in cread}

after that, calculating the sum and mean will be way more natural.

In this part:

class BreakIt(Exception): pass
try: 
    count = 1         
    for index, items in enumerate(sample_final_list):
        sample_1_dict_normalized = {}             
        for index, (genes, values) in enumerate(items):
            sample_1_dict_normalized[genes] = values
            count = count + 1
            if count == 29595:
                raise BreakIt
except BreakIt:
    pass

you should be (a) iterating of the samples list mentioned earlier, and (b) not using count at all, since you can iterate naturally over samples or sample[i].list or something like that.

Your code has several problems. You should put your code in functions that preferably do one thing each. Than you can call a function for each sample without repeating the same code six times (I assume that is what the ellipsis is hiding.). Give each function a self-describing name and a doc string that explains what it does. There is quite a bit unnecessary code. Some of this might become obvious once you have it in functions. Since functions take arguments you can hand in your 29595, for example.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM