I am trying to read the page source of a web page by this code:
from urllib.request import urlopen
url = "http://www.tsetmc.com/Loader.aspx?ParTree=15"
page = urlopen(url)
htmlSource = page.read().decode("utf-8")
f=open("output.txt",'w')
f.write(htmlSource)
but I get this erroe:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
I dont know if this page is encoded with utf-8 or not.
Thanks for your help.
I dont know if this page is encoded with utf-8 or not.
If you don't know how the page is encoded, you can just write the bytes to the file without trying to examine them:
page = urlopen(url)
htmlSource = page.read()
f=open("output.txt",'wb')
f.write(htmlSource)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.