從 selenium 獲取網頁的字節

Question

我正在嘗試使用 pdf 抓取網頁。

根據請求，我使用以下代碼獲取字節並使用 open() 保存

    pdf_response = requests.get(pdf_url)
    
    with open("sample.pdf", 'wb') as f:
        f.write(pdf_response.content)
        f.close

而且效果很好，

但是在下面的網頁上，我使用的是 selenium 但無法從響應 object 中獲取字節以在上述代碼中使用，

#This does not return a byte object as requests
driver = webdriver.Chrome()
driver.get(base)

content = driver.page_source.encode('utf-8').strip()

鏈接到 pdf（這有我用 2captcha 解決的驗證碼）

我收到的當前回復

''

Answer 1

我只能使用requests獲得 PDF

唯一的問題：我使用pillow生成帶有完整代碼的圖像並顯示它，我必須手動識別此代碼。 但是，如果您有某種方法可以自動識別它，那沒問題。

import requests
import lxml.html
from PIL import Image
import io

headers = {
    'User-Agent': 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0',
}

# --- create Session ---

s = requests.Session()
s.headers.update(headers)

# --- load main page ---

url = 'https://www.sedar.com/GetFile.do?lang=EN&docClass=8&issuerNo=00028264&issuerType=03&projectNo=03079934&docId=4755532'  # JSON

r = s.get(url)

# --- get images ---

soup = lxml.html.fromstring(r.text)

image_urls = soup.xpath('//img/@src')

# --- generate one image ---

full_image = Image.new('RGB', (40*5, 50))

for i, url in enumerate(image_urls):
    #print(url)
    r = s.get('https://www.sedar.com/' + url)
    
    image = Image.open(io.BytesIO(r.content))
    
    full_image.paste(image, (40*i, 0))

# --- ask for code --- 

full_image.show()

code = input('code> ')

#print('code:', code)

# --- get PDF ---

r = s.post('https://www.sedar.com/CheckCode.do', data={'code': code})

if r.headers['Content-Type'] != 'application/pdf':
    print('It is not PDF file')
else:
    with open('output.pdf', 'wb') as fh:
        print('size:', fh.write(r.content))

從 selenium 獲取網頁的字節

問題描述

1 個解決方案

解決方案1
0 已采納 2020-07-04 18:11:46

從 selenium 獲取網頁的字節

問題描述

1 個解決方案

解決方案1 0 已采納 2020-07-04 18:11:46

解決方案1
0 已采納 2020-07-04 18:11:46