'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

Question

I am trying to read the page source of a web page by this code:

from urllib.request import urlopen

url = "http://www.tsetmc.com/Loader.aspx?ParTree=15"

page = urlopen(url)

htmlSource = page.read().decode("utf-8")

f=open("output.txt",'w')
f.write(htmlSource)

but I get this erroe:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

I dont know if this page is encoded with utf-8 or not.

Thanks for your help.

Answer 1

I dont know if this page is encoded with utf-8 or not.

If you don't know how the page is encoded, you can just write the bytes to the file without trying to examine them:

page = urlopen(url)
htmlSource = page.read()
f=open("output.txt",'wb')
f.write(htmlSource)

'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

Question

1 answers

solution1
0 2020-08-20 18:52:51

'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

Question

1 answers

solution1 0 2020-08-20 18:52:51

solution1
0 2020-08-20 18:52:51