使用Python和Beautiful Soup从API XML提要中解析/提取数据

Question

Python/xml newb here playing around with Python and BeautifulSoup trying to learn how to parse XML, specifically messing with the Oodle.com API to list out car classifieds. 此处的Python / xml newb与Python和BeautifulSoup一起在尝试学习如何解析XML，特别是与Oodle.com API混淆以列出汽车分类。 I've had success with simple XML and BS, but when working with this, I can't seem to get the data I want no matter what I try. 我已经在使用简单的XML和BS方面取得了成功，但是无论如何尝试，我似乎都无法获得想要的数据。 I tried reading the Soup documentation for hours and can't figure it out. 我尝试阅读Soup文档数小时，无法弄清楚。 The XML is structured like: XML的结构如下：

<?xml version="1.0" encoding="utf-8"?>
<oodle_response stat="ok">
    <current>
        ....
    </current>
    <listings>
        <element>
            <id>8453458345</id>
            <title>2009 Toyota Avalon XL Sedan 4D</title>
            <body>...</body>
            <url>...</url>
            <images>
                <element>...</element>
                <element>...</element>
            </images>
            <attributes>
                <features>...</features>
                <mileage>32637</mileage>
                <price>19999</price>
                <trim>XL</trim>
                <vin>9234234234234234</vin>
                <year>2009</year>
            </attributes>
        </element>      
        <element>.. Next car here ..</element>
        <element>..Aaaand next one here ..</element>    
    </listings>
    <meta>...</meta>
</oodle_response>

I first make a request with urllib to grab the feed and save to a local file. 我首先向urllib发出请求，以获取供稿并保存到本地文件。 Then: 然后：

xml = open("temp.xml", "r")
from BeautifulSoup import BeautifulStoneSoup
soup = BeautifulStoneSoup(xml)

Then I'm not sure what. 那我不确定。 I've tried a lot of things but everything seems to throw back way more junk than I want and it makes to difficult to find the issue. 我已经尝试了很多方法，但是所有事情似乎都比我想要的要糟得多，这使查找问题变得困难。 I'm trying just get the id, title, mileage, price, year, vin. 我正在尝试获取ID，标题，里程，价格，年份，年份。 So how do I get these and expedite the process with a loop? 那么，如何获取这些信息并通过循环加快过程呢？ Ideally I wanted a for loop like: 理想情况下，我想要一个for循环，例如：

for soup.listings.element in soup.listings:
    id = soup.listings.element.id
    ...

I know that doesn't work obviously but something that would fetch info for the listing, and store it into a list, then move onto the next ad. 我知道这显然不起作用，但是有些东西会获取列表的信息，然后将其存储到列表中，然后移至下一个广告。 Appreciate the help guys 感谢帮助人员

Answer 1

You could do something like this: 您可以执行以下操作：

for element in soup('element'):
    id = element.id.text
    mileage = element.attributes.mileage.text
    price = element.attributes.price.text
    year = element.attributes.year.text
    vin = element.attributes.vin.text

使用Python和Beautiful Soup从API XML提要中解析/提取数据

问题描述

1 个解决方案

解决方案1
0 2011-10-11 20:12:49

使用Python和Beautiful Soup从API XML提要中解析/提取数据

问题描述

1 个解决方案

解决方案1 0 2011-10-11 20:12:49

解决方案1
0 2011-10-11 20:12:49