[英]Parsing/Extracting Data from API XML feed with Python and Beautiful Soup
Python/xml newb here playing around with Python and BeautifulSoup trying to learn how to parse XML, specifically messing with the Oodle.com API to list out car classifieds. 此处的Python / xml newb与Python和BeautifulSoup一起在尝试学习如何解析XML,特别是与Oodle.com API混淆以列出汽车分类。 I've had success with simple XML and BS, but when working with this, I can't seem to get the data I want no matter what I try.
我已经在使用简单的XML和BS方面取得了成功,但是无论如何尝试,我似乎都无法获得想要的数据。 I tried reading the Soup documentation for hours and can't figure it out.
我尝试阅读Soup文档数小时,无法弄清楚。 The XML is structured like:
XML的结构如下:
<?xml version="1.0" encoding="utf-8"?>
<oodle_response stat="ok">
<current>
....
</current>
<listings>
<element>
<id>8453458345</id>
<title>2009 Toyota Avalon XL Sedan 4D</title>
<body>...</body>
<url>...</url>
<images>
<element>...</element>
<element>...</element>
</images>
<attributes>
<features>...</features>
<mileage>32637</mileage>
<price>19999</price>
<trim>XL</trim>
<vin>9234234234234234</vin>
<year>2009</year>
</attributes>
</element>
<element>.. Next car here ..</element>
<element>..Aaaand next one here ..</element>
</listings>
<meta>...</meta>
</oodle_response>
I first make a request with urllib to grab the feed and save to a local file. 我首先向urllib发出请求,以获取供稿并保存到本地文件。 Then:
然后:
xml = open("temp.xml", "r")
from BeautifulSoup import BeautifulStoneSoup
soup = BeautifulStoneSoup(xml)
Then I'm not sure what. 那我不确定。 I've tried a lot of things but everything seems to throw back way more junk than I want and it makes to difficult to find the issue.
我已经尝试了很多方法,但是所有事情似乎都比我想要的要糟得多,这使查找问题变得困难。 I'm trying just get the id, title, mileage, price, year, vin.
我正在尝试获取ID,标题,里程,价格,年份,年份。 So how do I get these and expedite the process with a loop?
那么,如何获取这些信息并通过循环加快过程呢? Ideally I wanted a for loop like:
理想情况下,我想要一个for循环,例如:
for soup.listings.element in soup.listings:
id = soup.listings.element.id
...
I know that doesn't work obviously but something that would fetch info for the listing, and store it into a list, then move onto the next ad. 我知道这显然不起作用,但是有些东西会获取列表的信息,然后将其存储到列表中,然后移至下一个广告。 Appreciate the help guys
感谢帮助人员
You could do something like this: 您可以执行以下操作:
for element in soup('element'):
id = element.id.text
mileage = element.attributes.mileage.text
price = element.attributes.price.text
year = element.attributes.year.text
vin = element.attributes.vin.text
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.