Python urllib2 + Beautifulsoup

Question

因此，我正在努力將漂亮的代碼實現到當前的python項目中，好吧，為了保持簡潔明了，我將減少當前腳本的復雜性。

沒有BeautifulSoup的腳本-

import urllib2

    def check(self, name, proxy):
        urllib2.install_opener(
            urllib2.build_opener(
                urllib2.ProxyHandler({'http': 'http://%s' % proxy}),
                urllib2.HTTPHandler()
                )
            )

        req = urllib2.Request('http://example.com' ,"param=1")
        try:
            resp = urllib2.urlopen(req) 
        except:
            self.insert()
        try:
            if 'example text' in resp.read()
               print 'success'

現在當然縮進是錯誤的，這只是我正在做的事情的簡圖，您可以簡單地將一個發請求發送至“ example.com”，然后如果example.com包含“ example text”重新讀取打印成功。

但是我真正想要的是檢查

if ' example ' in resp.read()

然后使用以下命令從example.com請求的td align中輸出文本

soup.find_all('td', {'align':'right'})[4]

現在，我實現Beautifulsoup的方式不起作用，例如：

import urllib2
from bs4 import BeautifulSoup as soup

main_div = soup.find_all('td', {'align':'right'})[4]

    def check(self, name, proxy):
        urllib2.install_opener(
            urllib2.build_opener(
                urllib2.ProxyHandler({'http': 'http://%s' % proxy}),
                urllib2.HTTPHandler()
                )
            )

        req = urllib2.Request('http://example.com' ,"param=1")
        try:
            resp = urllib2.urlopen(req) 
            web_soup = soup(urllib2.urlopen(req), 'html.parser')
        except:
            self.insert()
        try:
            if 'example text' in resp.read()
               print 'success' + main_div

現在您看到我添加了4個新行/調整

from bs4 import BeautifulSoup as soup

web_soup = soup(urllib2.urlopen(url), 'html.parser')

main_div = soup.find_all('td', {'align':'right'})[4]

aswell as " + main_div " on print

但是它似乎似乎沒有用，我在調整一些錯誤時說了一些錯誤，這些錯誤說：“賦值之前引用了本地變量”和“未綁定方法find_all必須以beautifulsoup實例作為第一個參數調用”

Answer 1

關於您的最后一個代碼段：

from bs4 import BeautifulSoup as soup

web_soup = soup(urllib2.urlopen(url), 'html.parser')
main_div = soup.find_all('td', {'align':'right'})[4]

您應該在web_soup實例上調用find_all 。 另外，請務必在使用前定義url變量：

from bs4 import BeautifulSoup as soup

url = "url to be opened"
web_soup = soup(urllib2.urlopen(url), 'html.parser')
main_div = web_soup.find_all('td', {'align':'right'})[4]

Python urllib2 + Beautifulsoup

問題描述

1 個解決方案

解決方案1
1 2017-08-11 08:40:32

Python urllib2 + Beautifulsoup

問題描述

1 個解決方案

解決方案1 1 2017-08-11 08:40:32

解決方案1
1 2017-08-11 08:40:32