Getting bytes of a webpage from selenium

Question

I am trying to scrape a webpage with a pdf.

With request, I used the following code to get the bytes and save it with open()

    pdf_response = requests.get(pdf_url)
    
    with open("sample.pdf", 'wb') as f:
        f.write(pdf_response.content)
        f.close

And it works just fine,

However on the below webpage I am using selenium but could not get the bytes from response object to use in the above code,

#This does not return a byte object as requests
driver = webdriver.Chrome()
driver.get(base)

content = driver.page_source.encode('utf-8').strip()

link to pdf (this has captcha that I solve with 2captcha)

Current response that I recieve

''

Answer 1

I can get PDF using only requests

Only problem: I use pillow to generate image with full code and display it, and I have to manually recognize this code. But if you have some method to recognize it automatically then it is not problem.

import requests
import lxml.html
from PIL import Image
import io

headers = {
    'User-Agent': 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0',
}

# --- create Session ---

s = requests.Session()
s.headers.update(headers)

# --- load main page ---

url = 'https://www.sedar.com/GetFile.do?lang=EN&docClass=8&issuerNo=00028264&issuerType=03&projectNo=03079934&docId=4755532'  # JSON

r = s.get(url)

# --- get images ---

soup = lxml.html.fromstring(r.text)

image_urls = soup.xpath('//img/@src')

# --- generate one image ---

full_image = Image.new('RGB', (40*5, 50))

for i, url in enumerate(image_urls):
    #print(url)
    r = s.get('https://www.sedar.com/' + url)
    
    image = Image.open(io.BytesIO(r.content))
    
    full_image.paste(image, (40*i, 0))

# --- ask for code --- 

full_image.show()

code = input('code> ')

#print('code:', code)

# --- get PDF ---

r = s.post('https://www.sedar.com/CheckCode.do', data={'code': code})

if r.headers['Content-Type'] != 'application/pdf':
    print('It is not PDF file')
else:
    with open('output.pdf', 'wb') as fh:
        print('size:', fh.write(r.content))

Getting bytes of a webpage from selenium

Question

1 answers

solution1
0 ACCPTED 2020-07-04 18:11:46

Getting bytes of a webpage from selenium

Question

1 answers

solution1 0 ACCPTED 2020-07-04 18:11:46

solution1
0 ACCPTED 2020-07-04 18:11:46