使用BeautifulSoup刮取产品名称

Question

I'm using BeautifulSoup (BS4) to build a scraper tool that will allow me to pull the product name from any TopShop.com product page, which sits between 'h1' tags. 我正在使用BeautifulSoup（BS4）构建一个刮刀工具，它允许我从位于'h1'标签之间的任何TopShop.com产品页面中提取产品名称。 Can't figure out why the code I've written isn't working! 无法弄清楚为什么我写的代码不起作用！

from urllib2 import urlopen
from bs4 import BeautifulSoup
import re

TopShop_URL = raw_input("Enter a TopShop Product URL")
ProductPage = urlopen(TopShop_URL).read()

soup = BeautifulSoup(ProductPage)

ProductNames = soup.find_all('h1')

print ProductNames

Answer 1

I get this working using requests ( http://docs.python-requests.org/en/latest/ ) 我使用请求（ http://docs.python-requests.org/en/latest/ ）

from bs4 import BeautifulSoup
import requests

content = requests.get("TOPShop_URL").content
soup = BeautifulSoup(content)
product_names = soup.findAll("h1")
print product_names

Answer 2

Your code is correct, but the problem is that the div which includes the product name is dynamically generated via JavaScript. 您的代码是正确的，但问题是包含产品名称的div是通过JavaScript动态生成的。 In order to be able to successfully parse this element you should mind using Selenium or a similar tool, that will allow you to parse the webpage after all the dom has been fully loaded. 为了能够成功解析此元素，您应该介意使用Selenium或类似工具，这将允许您在所有dom完全加载后解析网页。

使用BeautifulSoup刮取产品名称

问题描述

2 个解决方案

解决方案1
2 已采纳 2013-02-14 23:45:45

解决方案2
0 2013-02-15 00:01:00

使用BeautifulSoup刮取产品名称

问题描述

2 个解决方案

解决方案1 2 已采纳 2013-02-14 23:45:45

解决方案2 0 2013-02-15 00:01:00

解决方案1
2 已采纳 2013-02-14 23:45:45

解决方案2
0 2013-02-15 00:01:00