简体   繁体   English

Python_Web_scraping HTML表

[英]Python_Web_scraping Html Table

I'm Python beginner developer, I'm still in the learning phase. 我是Python初学者开发人员,但仍处于学习阶段。 More specifically working on scraping using requests and bs4. 更具体地说,是使用request和bs4进行抓取。 When tried to scrape the following link: ' http://directorybtr.az.gov/listings/FirmSearchResults.asp?Zip%20Like%20%22850%25%22 ' 尝试抓取以下链接时:' http ://directorybtr.az.gov/listings/FirmSearchResults.asp?Zip%20Like%20% 22850%25%22 '

I used the following code : 我使用以下代码:

import requests

from bs4 import BeautifulSoup
url ="http://directorybtr.az.gov/listings/FirmSearchResults.asp?Zip%20Like%20%22850%25%22"
res = requests.get(url)

soup = BeautifulSoup(res.text, 'html.parser')
res.close()
results = soup.find('table')

There is no table in results although the table are present when inspecting the source page in Chrome. 尽管在Chrome中检查源页面时存在该表,但结果中没有任何表。 Any solution or explanation please? 有什么解决办法或解释吗?

Thank you 谢谢

Table data is inside frame, u need to go first 表数据在框架内,您需要先走

import requests
from lxml import html
from bs4 import BeautifulSoup
BASE_URL = "http://directorybtr.az.gov/listings/" 
URL = BASE_URL + "FirmSearchResults.asp?Zip%20Like%20%22850%25%22"
#u need session because the frame use the search results data, u cant directly go to Firms.asp
session = requests.session()
response = session.get(URL)
soup = BeautifulSoup(response.text, 'lxml')
#find the first frame 
frame = soup.find("frame")
#go to the frame link ( Firms.asp )
response = session.get(BASE_URL + frame.attrs['src'])
soup = BeautifulSoup(response.text, 'lxml')
table = soup.find("table")
print table
response.close()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM