i'm trying to get text from a webpage and it makes 'Traceback (most recent call last): File "C:\\Users\\username\\Desktop\\Python\\parsing.py", line 21, in textFile.write(str(results)) UnicodeEncodeError: 'cp949' codec can't encode character '\\xa9' in position 37971: illegal multibyte sequence'
I've searched and tried textFile.write(str(results).decode('utf-8')) and it makes no attribute arror.
import requests
import os
from bs4 import BeautifulSoup
outputFolderName = "output"
currentPath = os.path.dirname(os.path.realpath(__file__))
outputDir = currentPath + "/" +outputFolderName
r = requests.get('https://yahoo.com/')
soup = BeautifulSoup(r.text, 'html.parser')
results = soup.findAll(text=True)
try :
os.mkdir(outputDir)
print("output directory generated")
except :
print("using existing directory")
textFile = open(outputDir + '/output.txt', 'w')
textFile.write(str(results))
textFile.close()
Is there any way to convert the codec of str(results) and save it properly??
python version is 3.7.3
Please specify the encoding like in this example
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import requests
import os
from bs4 import BeautifulSoup
outputFolderName = "output"
currentPath = os.path.dirname(os.path.realpath(__file__))
outputDir = currentPath + "/" +outputFolderName
r = requests.get('https://yahoo.com')
soup = BeautifulSoup(r.text, 'html.parser')
results = soup.findAll(text=True)
try :
os.mkdir(outputDir)
print("output directory generated")
except :
print("using existing directory")
textFile = open(outputDir + '/output.txt', mode='w', encoding='utf8')
textFile.write(str(results))
textFile.close()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.