简体   繁体   中英

python getting unicode encode error when saving file

i'm trying to get text from a webpage and it makes 'Traceback (most recent call last): File "C:\\Users\\username\\Desktop\\Python\\parsing.py", line 21, in textFile.write(str(results)) UnicodeEncodeError: 'cp949' codec can't encode character '\\xa9' in position 37971: illegal multibyte sequence'

I've searched and tried textFile.write(str(results).decode('utf-8')) and it makes no attribute arror.

import requests
import os
from bs4 import BeautifulSoup

outputFolderName = "output"

currentPath = os.path.dirname(os.path.realpath(__file__))
outputDir = currentPath + "/" +outputFolderName

r = requests.get('https://yahoo.com/')
soup = BeautifulSoup(r.text, 'html.parser')
results = soup.findAll(text=True)

try :
    os.mkdir(outputDir)
    print("output directory generated")
except :
    print("using existing directory")

textFile = open(outputDir + '/output.txt', 'w')
textFile.write(str(results))
textFile.close()

Is there any way to convert the codec of str(results) and save it properly??

python version is 3.7.3

Please specify the encoding like in this example

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import requests
import os
from bs4 import BeautifulSoup

outputFolderName = "output"

currentPath = os.path.dirname(os.path.realpath(__file__))
outputDir = currentPath + "/" +outputFolderName

r = requests.get('https://yahoo.com')
soup = BeautifulSoup(r.text, 'html.parser')
results = soup.findAll(text=True)

try :
    os.mkdir(outputDir)
    print("output directory generated")
except :
    print("using existing directory")

textFile = open(outputDir + '/output.txt', mode='w', encoding='utf8')
textFile.write(str(results))
textFile.close()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM