简体   繁体   中英

dict comprehension for nested lists to filter values of multiple variables

I have a working example of dict comprehension on a list I iterate over: This generates various indicators (selections), separating the rows of my data into cases (which are not exclusive, by the way).

For context: This is done to count cases for specific rows (criterion defined by a column) when I aggregate the table to some groups. The indicators are collected in separate dataframes now to export separately, though I am also happy to keep all in one dataframe for a single aggregation, concatenation and export, if possible.

Now I want to nest this into another loop. This loop would define which other variable I select/filter for the values. So item 0 would still be the condition itself (sum of the indicator being the count of the cases), but item 1 the selected cases of TKOST (to see a selective sum for separate criteria later), item 2 for another variable I'd now read in.

But it would make sense for this loop to effect the variable names too, eg to have a blank neuro variable for the count (or neuro_count ), a neuro_cost for the sum of TKOST for the neuro cases etc. How is this possible?

The sample code basically comes from Alexander's answer on another question. The file I/O and pandas parts are provided for context.

import pandas as pd

items = {'neuro': 'N', 
         'cardio': 'C', 
         'cancer': 'L', 
         'anesthetics': 'N01', 
         'analgesics': 'N02', 
         'antiepileptics': 'N03', 
         'anti-parkinson drugs': 'N04', 
         'psycholeptics': 'N05', 
         'psychoanaleptics': 'N06', 
         'addiction_and_other_neuro': 'N07', 
         'Adrugs': 'A', 
         'Mdrugs': 'M', 
         'Vdrugs': 'V', 
         'all_drugs': ''}

# Create data containers using dictionary comprehension.
dfs = {item: pd.DataFrame() for item in items.keys()}
monthly_summaries = {item: list() for item in items.keys()}

# Perform monthly groupby operations.
for year in xrange(2005, 2013):
    for month in xrange(1, 13):
        if year == 2005 and month < 7:
            continue
        filename = 'PATH/STUB_' + str(year) + '_mon'+ str(month) +'.txt'
        monthly = pd.read_table(filename,usecols=[0,3,32])
        monthly['year'] = year
        monthly['month'] = month
        dfs = {name: monthly[(monthly.ATC.str.startswith('{0}'.format(code))) 
                             & (~(monthly.TKOST.isnull()))]
                     for name, code in items.iteritems()}
        [monthly_summaries[name].append(dfs[name].groupby(['LopNr','year','month']).sum()
                                        .astype(int, copy=False)) 
         for name in items.keys()]

# Now concatenate all of the monthly summaries into separate DataFrames.
dfs = {name: pd.concat([monthly_summaries[name]], ignore_axis=True) 
       for name in items.keys()}

# Now regroup the aggregate monthly summaries.
monthly_summaries = {name: dfs[name].reset_index().groupby(['LopNr','year','month']).sum()
                    for name in items.keys()}

# Finally, save the aggregated results to files.
[monthly_summaries[name].to_csv('PATH/monthly_{0}_costs.csv'.format(name))
 for name in items()]

You should prefer an explicit for loop:

for name in items.keys():
    monthly_summaries[name].append(dfs[name].groupby(['LopNr','year','month']).sum()
                                            .astype(int, copy=False)

# rather than
[monthly_summaries[name].append(dfs[name].groupby(['LopNr','year','month']).sum()
                                         .astype(int, copy=False)) 
    for name in items.keys()]

The latter creates a dummy list of None s (as well as being less readable) so is less efficient.

The former allows you to nest easily...


But it would make sense for this loop to effect the variable names too, eg to have a blank neuro variable for the count (or neuro_count), a neuro_cost for the sum of TKOST for the neuro cases etc. How is this possible?

I usually add columns to do these counts, that way it can be vectorized/split/other.
(Then don't write these columns out to csv.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM