簡體   English   中英

Python - UnicodeDecodeError:'charmap' 編解碼器無法解碼 position 中的字節 0x81 229393:: 字符映射到<undefined></undefined>

[英]Python - UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 229393:: character maps to <undefined>

我嘗試使用 Python 和 Selenium 抓取一個網站。

這是部分代碼:

    def data_html_text(self): #Downloads page source code
        Xyz_page_source = self.driver.page_source
        with open(self.Html_source, 'w', encoding="utf-8") as file:
             file.write(Xyz_page_source)


    def email_parser(self): # gets scraped links and filters it 
        count = 0

        file = open(self.Html_source)
        data = file.read()
        soup = BeautifulSoup(data, 'lxml')
        all_divs = soup.find_all('li',class_='badgeList__item',)
        scrapper_links = [self.Base_url + a_href.div.div.a['href'] for a_href in all_divs]

        for link in scrapper_links:
            count += 1
            print("{} ------> {}".format(count,link))

        count = 0

        data = []
        for s_link in scrapper_links:
            user_page = requests.get(s_link, headers=self.headers)
            text = user_page.content
            inner_pagee = text.decode()
            all_emails = re.findall(r'[w\w.-]+@[\w\.-]+', inner_pagee)
            if all_emails:
                count += 1
                print("{} Scraping Emails: {}".format(count, all_emails[0]))
                data.append(all_emails[0])
                new_data = list(set(data))

        data1 =[]
        for x in new_data:
            x = re.sub('[.]$','',x)
            data1.append(x)
        print(data1)


        with open('test.csv', "w", encoding="utf-8") as output:
            writer = csv.writer(output, lineterminator='\n')
            for val in data1:
                writer.writerow([val])

但我不斷收到以下錯誤:

UnicodeDecodeError:'charmap' 編解碼器無法解碼 position 229393 中的字節 0x81:字符映射到

關於如何解決這個問題的任何想法?

您打開的文件不是 utf-8 格式,請檢查格式(編碼)並使用它代替 utf-8。

嘗試

  encoding='utf-8-sig'

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM