![](/img/trans.png)
[英]What is this function doing in Python involving urllib2 and BeautifulSoup?
[英]Python urllib2 + Beautifulsoup
因此,我正在努力將漂亮的代碼實現到當前的python項目中,好吧,為了保持簡潔明了,我將減少當前腳本的復雜性。
沒有BeautifulSoup的腳本-
import urllib2
def check(self, name, proxy):
urllib2.install_opener(
urllib2.build_opener(
urllib2.ProxyHandler({'http': 'http://%s' % proxy}),
urllib2.HTTPHandler()
)
)
req = urllib2.Request('http://example.com' ,"param=1")
try:
resp = urllib2.urlopen(req)
except:
self.insert()
try:
if 'example text' in resp.read()
print 'success'
現在當然縮進是錯誤的,這只是我正在做的事情的簡圖,您可以簡單地將一個發請求發送至“ example.com”,然后如果example.com包含“ example text”重新讀取打印成功。
但是我真正想要的是檢查
if ' example ' in resp.read()
然后使用以下命令從example.com請求的td align中輸出文本
soup.find_all('td', {'align':'right'})[4]
現在,我實現Beautifulsoup的方式不起作用,例如:
import urllib2
from bs4 import BeautifulSoup as soup
main_div = soup.find_all('td', {'align':'right'})[4]
def check(self, name, proxy):
urllib2.install_opener(
urllib2.build_opener(
urllib2.ProxyHandler({'http': 'http://%s' % proxy}),
urllib2.HTTPHandler()
)
)
req = urllib2.Request('http://example.com' ,"param=1")
try:
resp = urllib2.urlopen(req)
web_soup = soup(urllib2.urlopen(req), 'html.parser')
except:
self.insert()
try:
if 'example text' in resp.read()
print 'success' + main_div
現在您看到我添加了4個新行/調整
from bs4 import BeautifulSoup as soup
web_soup = soup(urllib2.urlopen(url), 'html.parser')
main_div = soup.find_all('td', {'align':'right'})[4]
aswell as " + main_div " on print
但是它似乎似乎沒有用,我在調整一些錯誤時說了一些錯誤,這些錯誤說:“賦值之前引用了本地變量”和“未綁定方法find_all必須以beautifulsoup實例作為第一個參數調用”
關於您的最后一個代碼段:
from bs4 import BeautifulSoup as soup
web_soup = soup(urllib2.urlopen(url), 'html.parser')
main_div = soup.find_all('td', {'align':'right'})[4]
您應該在web_soup實例上調用find_all
。 另外,請務必在使用前定義url
變量:
from bs4 import BeautifulSoup as soup
url = "url to be opened"
web_soup = soup(urllib2.urlopen(url), 'html.parser')
main_div = web_soup.find_all('td', {'align':'right'})[4]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.