简体   繁体   中英

'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

I am trying to read the page source of a web page by this code:

from urllib.request import urlopen

url = "http://www.tsetmc.com/Loader.aspx?ParTree=15"

page = urlopen(url)

htmlSource = page.read().decode("utf-8")

f=open("output.txt",'w')
f.write(htmlSource)

but I get this erroe:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

I dont know if this page is encoded with utf-8 or not.

Thanks for your help.

I dont know if this page is encoded with utf-8 or not.

If you don't know how the page is encoded, you can just write the bytes to the file without trying to examine them:

page = urlopen(url)
htmlSource = page.read()
f=open("output.txt",'wb')
f.write(htmlSource)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM