linux中的Python unicode錯誤但不是windows

Question

我按照一些指南拼湊了這一點python

import requests
import sys
from bs4 import BeautifulSoup

url = requests.get(sys.argv[1])

html = BeautifulSoup(url.content,'html.parser')

for br in html.find_all("br"):
    br.replace_with(" ")

for tr in html.find_all('tr'):
    data = []   

    for td in tr.find_all('td'):
        data.append(td.text.strip())

    if data:
        print("{}".format(','.join(data)))

在 Windows 中，它按我的預期工作。

在 Linux 我得到

Traceback (most recent call last):
  File "html2csv.py", line 19, in <module>
    print("{}".format(','.join(data)))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb0' in position 4: ordinal not in range(128)

我需要在我的腳本中更改什么來防止這種情況發生？ 我讀到您可以忽略問題字符，但有人說這不是正確的方法？ 不確定如何將我找到的任何解決方案實施到我所擁有的解決方案中。

Answer 1

很抱歉浪費您的時間。

我在用...

python script.py

默認為 2.7

我需要運行的是...

python3 script.py

Answer 2

我遇到了同樣的問題，似乎在 MS Windows 中編碼會留下一些幽靈字符（猜想您可以將 IDE 配置為不這樣做）。

嘗試在腳本文件的頂部添加# -*- coding: utf-8 -*-如下：

#!/usr/bin/env python

# -*- coding: utf-8 -*-

# import ipdb; ipdb.set_trace()

import json
import os, sys

class CSV_LOADER():
    """
    Script that handles batch credentials (in CSV format), both locally and
    to remote machines.

...

Answer 3

您的 Python IO 編碼可能出於某種原因設置為ascii （可能是由於系統區域設置配置錯誤），因此打印到標准輸出（並從標准輸入讀取）的所有內容都被解釋為 ASCII。

在運行腳本之前將PYTHONIOENCODING環境變量設置為utf-8 （或者更好的是，確保系統的locale設置正確）。

linux中的Python unicode錯誤但不是windows

問題描述

3 個解決方案

解決方案1
1 2020-02-04 13:58:50

解決方案2
0 2020-02-04 13:24:11

解決方案3
0 2020-02-04 13:27:30

linux中的Python unicode錯誤但不是windows

問題描述

3 個解決方案

解決方案1 1 2020-02-04 13:58:50

解決方案2 0 2020-02-04 13:24:11

解決方案3 0 2020-02-04 13:27:30

解決方案1
1 2020-02-04 13:58:50

解決方案2
0 2020-02-04 13:24:11

解決方案3
0 2020-02-04 13:27:30