简体   繁体   中英

Parsing/Extracting Data from API XML feed with Python and Beautiful Soup

Python/xml newb here playing around with Python and BeautifulSoup trying to learn how to parse XML, specifically messing with the Oodle.com API to list out car classifieds. I've had success with simple XML and BS, but when working with this, I can't seem to get the data I want no matter what I try. I tried reading the Soup documentation for hours and can't figure it out. The XML is structured like:

<?xml version="1.0" encoding="utf-8"?>
<oodle_response stat="ok">
    <current>
        ....
    </current>
    <listings>
        <element>
            <id>8453458345</id>
            <title>2009 Toyota Avalon XL Sedan 4D</title>
            <body>...</body>
            <url>...</url>
            <images>
                <element>...</element>
                <element>...</element>
            </images>
            <attributes>
                <features>...</features>
                <mileage>32637</mileage>
                <price>19999</price>
                <trim>XL</trim>
                <vin>9234234234234234</vin>
                <year>2009</year>
            </attributes>
        </element>      
        <element>.. Next car here ..</element>
        <element>..Aaaand next one here ..</element>    
    </listings>
    <meta>...</meta>
</oodle_response>

I first make a request with urllib to grab the feed and save to a local file. Then:

xml = open("temp.xml", "r")
from BeautifulSoup import BeautifulStoneSoup
soup = BeautifulStoneSoup(xml)

Then I'm not sure what. I've tried a lot of things but everything seems to throw back way more junk than I want and it makes to difficult to find the issue. I'm trying just get the id, title, mileage, price, year, vin. So how do I get these and expedite the process with a loop? Ideally I wanted a for loop like:

for soup.listings.element in soup.listings:
    id = soup.listings.element.id
    ...

I know that doesn't work obviously but something that would fetch info for the listing, and store it into a list, then move onto the next ad. Appreciate the help guys

You could do something like this:

for element in soup('element'):
    id = element.id.text
    mileage = element.attributes.mileage.text
    price = element.attributes.price.text
    year = element.attributes.year.text
    vin = element.attributes.vin.text

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM