Python_Web_scraping HTML表

Question

I'm Python beginner developer, I'm still in the learning phase. 我是Python初学者开发人员，但仍处于学习阶段。 More specifically working on scraping using requests and bs4. 更具体地说，是使用request和bs4进行抓取。 When tried to scrape the following link: ' http://directorybtr.az.gov/listings/FirmSearchResults.asp?Zip%20Like%20%22850%25%22 ' 尝试抓取以下链接时：' http ://directorybtr.az.gov/listings/FirmSearchResults.asp?Zip%20Like%20% 22850%25%22 '

I used the following code : 我使用以下代码：

import requests

from bs4 import BeautifulSoup
url ="http://directorybtr.az.gov/listings/FirmSearchResults.asp?Zip%20Like%20%22850%25%22"
res = requests.get(url)

soup = BeautifulSoup(res.text, 'html.parser')
res.close()
results = soup.find('table')

There is no table in results although the table are present when inspecting the source page in Chrome. 尽管在Chrome中检查源页面时存在该表，但结果中没有任何表。 Any solution or explanation please? 有什么解决办法或解释吗？

Thank you 谢谢

Answer 1

Table data is inside frame, u need to go first 表数据在框架内，您需要先走

import requests
from lxml import html
from bs4 import BeautifulSoup
BASE_URL = "http://directorybtr.az.gov/listings/" 
URL = BASE_URL + "FirmSearchResults.asp?Zip%20Like%20%22850%25%22"
#u need session because the frame use the search results data, u cant directly go to Firms.asp
session = requests.session()
response = session.get(URL)
soup = BeautifulSoup(response.text, 'lxml')
#find the first frame 
frame = soup.find("frame")
#go to the frame link ( Firms.asp )
response = session.get(BASE_URL + frame.attrs['src'])
soup = BeautifulSoup(response.text, 'lxml')
table = soup.find("table")
print table
response.close()

Python_Web_scraping HTML表

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-11-24 21:57:58

Python_Web_scraping HTML表

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-11-24 21:57:58

解决方案1
0 已采纳 2017-11-24 21:57:58