[英]Quotes Messing Up Python Scraper
I am trying to scrape all the data within a div as follows. 我正在尝试抓取div中的所有数据,如下所示。 However, the quotes are throwing me off.
但是,引号使我失望。
<div id="address">
<div class="info">14955 Shady Grove Rd.</div>
<div class="info">Rockville, MD 20850</div>
<div class="info">Suite: 300</div>
</div>
I am trying to start it with something along the lines of 我正在尝试从以下方面开始
addressStart = page.find("<div id="address">")
but the quotes within the div are messing me up. 但是div中的引号使我感到困惑。 Does anybody know how I can fix this?
有人知道我该如何解决吗?
To answer your specific question, you need to escape the quotes, or use a different type of quote on the string itself: 要回答您的特定问题,您需要对引号进行转义 ,或在字符串本身上使用不同类型的引号:
addressStart = page.find("<div id=\"address\">")
# or
addressStart = page.find('<div id="address">')
But don't do that. 但是不要那样做。 If you are trying to "parse" HTML, let a third-party library do that.
如果您试图“解析” HTML,请让第三方库来做。 Try Beautiful Soup .
尝试美丽的汤 。 You get a nice object back which you can use to traverse or search.
您会得到一个不错的对象,可用于遍历或搜索。 You can grab attributes, values, etc... without having to worry about the complexities of parsing HTML or XML:
您可以获取属性,值等...而不必担心解析HTML或XML的复杂性:
from bs4 import BeautifulSoup
soup = BeautifulSoup(page)
for address in soup.find_all('div',id='address'): # returns a list, use find if you just want the first
for info in address.find_all('div',class_='info'): # for attribute class, use class_ instead since class is a reserved word
print info.string
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.