简体   繁体   English

尝试运行 python 脚本时出现 unicode 错误

[英]error with unicode when trying to run python script

I try to execute a python script and I get an error, saying "charmap" can't decode a byte, because character maps to undefined.我尝试执行一个 python 脚本,但出现错误,说“charmap”无法解码一个字节,因为字符映射到未定义。 I guess it has something to do with unicode, however I am not that experienced to solve the problem.我想这与unicode有关,但是我没有解决问题的经验。

def load_imdb_sentiment_analysis_dataset(data_path = 
"C:/Users/name/Desktop", seed=123):

imdb_data_path = os.path.join(data_path, 'aclImdb')

# Load the training data
train_texts = []
train_labels = []
for category in ['pos', 'neg']:
    train_path = os.path.join(imdb_data_path, 'train', category)
    for fname in sorted(os.listdir(train_path)):
        if fname.endswith('.txt'):
            with open(os.path.join(train_path, fname)) as f:
                train_texts.append(f.read())
            train_labels.append(0 if category == 'neg' else 1)

# Load the validation data.
test_texts = []
test_labels = []
for category in ['pos', 'neg']:
    test_path = os.path.join(imdb_data_path, 'test', category)
    for fname in sorted(os.listdir(test_path)):
        if fname.endswith('.txt'):
            with open(os.path.join(test_path, fname)) as f:
                test_texts.append(f.read())
            test_labels.append(0 if category == 'neg' else 1)

# Shuffle the training data and labels.
random.seed(seed)
random.shuffle(train_texts)
random.seed(seed)
random.shuffle(train_labels)

return ((train_texts, np.array(train_labels)),
        (test_texts, np.array(test_labels)))

I get the following error: UnicodeDecodeError: 'charmap' codec can't decode byte 0xaa in position 489: character maps to我收到以下错误:UnicodeDecodeError: 'charmap' codec can't decode byte 0xaa in position 489: character maps to

You need to figure out the encoding of the file you trying to open.您需要弄清楚您尝试打开的文件的编码。 And use it in open function.并在 open 函数中使用它。

For example for utf8: open(filename, encoding='utf8')例如对于 utf8: open(filename, encoding='utf8')

so you can change from with open(os.path.join(train_path, fname)) to with open(os.path.join(train_path, fname), encoding='utf8')所以你可以从with open(os.path.join(train_path, fname))更改为with open(os.path.join(train_path, fname), encoding='utf8')

If you don't care about the characters that can't be open you could just skip them (be careful in such approaches): open(filename, errors='ignore')如果您不关心无法打开的字符,则可以跳过它们(在这种方法中要小心): op​​en(filename, errors='ignore')

with open(os.path.join(train_path, fname), errors='ignore')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 尝试在 python 中运行脚本时出现权限错误 - Permissions Error when trying to run a script in python 尝试运行Python脚本时出现混淆错误 - Confusing error when trying to run Python script 试图从命令行运行Python脚本时,截断的unicode转义 - Truncated unicode escape trying to run a Python script from the command line 尝试运行Python脚本时“导入:未找到” - “import: not found” when trying to run a Python script 尝试在python中读取csv文件时出现Unicode解码错误 - Unicode Decode Error when trying to read csv file in python 尝试在 Python 中读取 Matlab 文件时出现 SyntaxError(unicode 错误) - SyntaxError (unicode error) when trying to read Matlab file in Python Unicode 将 bash 脚本的 output 传递到 Z23EEEB4347BDD7556BFCZB7EEA9 时出现解码错误 - Unicode decode error when passing a bash script's output to python Unicode python脚本output输出到文件时出错 - Unicode error when outputting python script output to file 尝试从 ssm 运行 ec2 python 脚本时出现“ImportError: No module named sqlalchemy”错误 - 'ImportError: No module named sqlalchemy' error when trying to run ec2 python script from ssm 尝试从bash运行Python脚本时出现奇怪的HTML doctype错误 - Strange HTML doctype error when trying to run Python script from bash
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM