简体   繁体   中英

Creating a list of OrderedDicts

I'm trying to parse this comma separated content which lists out some surveys for HiPS . Here is my code:

with open('surveys.txt') as data:
    text = data.read()
    surveys = OrderedDict()
    for survey in text.split('\n'):
        for line in survey.split('\n'):
            # Skip empty or comment lines
            if line == '' or line.startswith('#'):
                continue
            try:
                key, value = [_.strip() for _ in line.split('=')]
                surveys[key] = value
            except ValueError:
                continue

The gives me an OrderedDict containing the elements of the last survey. This is because the content of surveys is over-written after each loop iteration.

I tried to resolve this by creating an OrderedDict for each new survey , and appending it to a list , using this code:

with open('surveys.txt') as data:
    text = data.read()
    surveys = []
    for survey in text.split('\n'):
        survey_OD = OrderedDict()
        for line in survey.split('\n'):
            # Skip empty or comment lines
            if line == '' or line.startswith('#'):
                continue
            try:
                key, value = [_.strip() for _ in line.split('=')]
                survey_OD[key] = value
            except ValueError:
                continue
        surveys.append(survey_OD)

But this creates a separate OrderedDict for each comma separated value, like this:

OrderedDict([('hips_order', '3')]) OrderedDict([('hips_frame', 'galactic')]) OrderedDict([('hips_tile_format', 'jpeg fits')])

When, instead I expect something like this:

OrderedDict([('hips_order', '3'), ('hips_frame', 'galactic'), ('hips_tile_format', 'jpeg fits')])

Let's slightly improve your first approach:

  • separate surveys by double newline character,
  • collect surveys in list.

So we can write

from collections import OrderedDict

with open('surveys.txt') as data:
    text = data.read()
surveys = list()
for raw_survey in text.split('\n\n'):
    survey = OrderedDict()
    # Skip empty lines
    for line in filter(None, raw_survey.split('\n')):
        # Skip comment lines
        if line.startswith('#'):
            continue
        try:
            key, value = [_.strip() for _ in line.split('=')]
            survey[key] = value
        except ValueError:
            continue
    surveys.append(survey)

will give us

>>>surveys[0]
OrderedDict([('ID', 'CDS/C/MUSE-M42'), ('creator_did', 'ivo://CDS/C/MUSE-M42'),
             ('obs_collection', 'MUSE-M42'),
             ('obs_title', 'MUSE map of the central Orion Nebula (M 42)'), (
             'obs_description',
             'Integral-field spectroscopic dataset of the central part of the Orion Nebula (M 42), observed with the MUSE instrument at the ESO VLT (reduced the data with the public MUSE pipeline) representing a FITS cube with a spatial size of ~5.9\'x4.9\' (corresponding to ~0.76 pc x 0.63 pc) and a contiguous wavelength coverage of 4595...9366 Angstrom, spatially sampled at 0.2", with a sampling of 1.25 Angstrom in dispersion direction.'),
             ('obs_ack', 'Based on data obtained from the ESO/VLT'),
             ('prov_progenitor', 'MUSE Consortium'),
             ('bib_reference', '2015A&A...582A.114W'),
             ('obs_copyright', 'Copyright mention of the original data'),
             ('obs_copyright_url', 'http://muse-vlt.eu/science/m42/'),
             ('hips_release_date', '2015-07-07T00:29Z'),
             ('hips_builder', 'Aladin/HipsGen v9.505'), ('hips_order', '12'),
             ('hips_pixel_cut', '0 7760'), ('hips_tile_format', 'png fits'),
             ('hips_cube_depth', '3818'), ('hips_cube_firstframe', '1909'),
             ('hips_frame', 'equatorial'), ('dataproduct_type', 'image'),
             ('t_min', '56693'), ('t_max', '56704'), ('em_min', '4,595e-7'),
             ('em_max', '9,366e-7'), ('hips_version', '1.31'),
             ('hips_creation_date', '03/07/15 12:00:30'),
             ('hips_creator', 'CDS (P.Fernique)'), ('hips_tile_width', '512'),
             ('hips_status', 'public master clonableOnce'),
             ('hips_pixel_bitpix', '-32'), ('data_pixel_bitpix', '-32'),
             ('hips_hierarchy', 'mean'), ('hips_initial_ra', '83.82094'),
             ('hips_initial_dec', '-5.39542'), ('hips_initial_fov', '0.09811'),
             ('hips_pixel_scale', '2.795E-5'), ('s_pixel_scale', '5.555E-5'),
             ('moc_sky_fraction', '2.980E-7'), ('hips_estsize', '87653'),
             ('data_bunit', '10**(-20)*erg/s/cm**2/Angstrom'),
             ('data_cube_crpix3', '1'), ('data_cube_crval3', '4595'),
             ('data_cube_cdelt3', '1.25'), ('data_cube_bunit3', 'Angstrom'),
             ('client_application', 'AladinDesktop'),
             ('hips_copyright', 'CNRS/Unistra'), ('obs_regime', 'Optical'),
             ('hips_service_url', 'http://alasky.unistra.fr/MUSE/MUSE-M42'), (
             'hips_service_url_1',
             'http://alaskybis.unistra.fr/MUSE/MUSE-M42'),
             ('hips_status_1', 'public mirror clonable'), (
             'hips_service_url_2',
             'https://alaskybis.unistra.fr/MUSE/MUSE-M42'),
             ('hips_status_2', 'public mirror clonable'), ('moc_order', '12'),
             ('obs_initial_ra', '83.82094'), ('obs_initial_dec', '-5.39542'),
             ('obs_initial_fov', '0.014314526715905856'),
             ('TIMESTAMP', '1490387811000')])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM