简体   繁体   中英

Parsing XML data into a dict, not sure of the most Pythonic way to do it

Lets say I have some XML data on an online product with multiple prices:

<Response>
    <TotalOffers>6</TotalOffers>
    <LowPrices>
        <LowPrice condition="new">
            <CurrencyCode>USD</CurrencyCode>
            <Amount>15.50</Amount>
        </LowPrice>
        <LowPrice condition="used">
            <CurrencyCode>USD</CurrencyCode>
            <Amount>22.86</Amount>
        </LowPrice>
    </LowPrices>
</Response>

My ultimate goal is to pass it through a function that parses the XML into the form of a simplified dict that looks something like this:

response = {
    'total_offers': 6,
    'low_prices': [
        {'condition': "new", 'currency': "USD", 'amount': 15.50},
        {'condition': "used", 'currency': "USD", 'amount': 22.86},
    ]
}

Using the lxml library this is pretty simple to do. I just have to specify the xpath for finding each value and then handle exceptions where the expected data is missing, for example to get the TotalOffers value (6) I would do something like this:

# convert xml to etree object
tree_obj = etree.fromstring(xml_text)
# use xpath to find values that I want in this tree object
matched_els = tree_obj.xpath('//TotalOffers')
# xpath matches are returned as a list
# since there could be more than one match grab only the first one
first_match_el = matched_els[0]
# extract the text and print to console
print first_match_el.text
# >>> '6'

Now my thinking is I could write a function like get_text(tree_obj, xpath_to_value) but then what if I also want this function to convert the value into its appropriate type (eg: string, float, or int) should I have a param that specifies the type like so get_text(tree_obj, xpath_to_value, type='float') ?

Because if I do that my next step in creating the dict would be something like this:

low_prices = []
low_prices_els = tree_obj.xpath('//LowPrices')
for el in low_prices_els:
    low_prices.append(
        {
            'condition': get_text(el, './@condition', type='str'),
            'currency': get_text(el, './CurrencyCode', type='str'),
            'amount': get_text(el, './Amount', type='float')
        }
    )

response = {
    'total_offers': get_text(tree_obj, '//TotalOffers', type='int'),
    'low_prices': low_prices
}

Is this the best way to accomplish what I am trying to do? I feel like I'm creating problems for myself in the future.

I think the tool you need is xml to json tool, it converts the xml document to json format, you can test it in :

http://codebeautify.org/xmltojson

在此输入图像描述

out:

{"Response":{"TotalOffers":"6","LowPrices":{"LowPrice":[{"CurrencyCode":"USD","Amount":"15.50","_condition":"new"},{"CurrencyCode":"USD","Amount":"22.86","_condition":"used"}]}}}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM