引述Python Scraper的混乱

Question

I am trying to scrape all the data within a div as follows. 我正在尝试抓取div中的所有数据，如下所示。 However, the quotes are throwing me off. 但是，引号使我失望。

<div id="address">
    <div class="info">14955 Shady Grove Rd.</div> 
    <div class="info">Rockville, MD 20850</div> 
    <div class="info">Suite: 300</div> 
</div>

I am trying to start it with something along the lines of 我正在尝试从以下方面开始

addressStart = page.find("<div id="address">")

but the quotes within the div are messing me up. 但是div中的引号使我感到困惑。 Does anybody know how I can fix this? 有人知道我该如何解决吗？

Answer 1

To answer your specific question, you need to escape the quotes, or use a different type of quote on the string itself: 要回答您的特定问题，您需要对引号进行转义，或在字符串本身上使用不同类型的引号：

addressStart = page.find("<div id=\"address\">")
# or
addressStart = page.find('<div id="address">')

But don't do that. 但是不要那样做。 If you are trying to "parse" HTML, let a third-party library do that. 如果您试图“解析” HTML，请让第三方库来做。 Try Beautiful Soup . 尝试美丽的汤。 You get a nice object back which you can use to traverse or search. 您会得到一个不错的对象，可用于遍历或搜索。 You can grab attributes, values, etc... without having to worry about the complexities of parsing HTML or XML: 您可以获取属性，值等...而不必担心解析HTML或XML的复杂性：

from bs4 import BeautifulSoup
soup = BeautifulSoup(page)
for address in soup.find_all('div',id='address'): # returns a list, use find if you just want the first
    for info in address.find_all('div',class_='info'): # for attribute class, use class_ instead since class is a reserved word
        print info.string

引述Python Scraper的混乱

问题描述

1 个解决方案

解决方案1
1 已采纳 2013-12-29 03:08:05

引述Python Scraper的混乱

问题描述

1 个解决方案

解决方案1 1 已采纳 2013-12-29 03:08:05

解决方案1
1 已采纳 2013-12-29 03:08:05