繁体   English   中英

尝试运行 python 脚本时出现 unicode 错误

[英]error with unicode when trying to run python script

我尝试执行一个 python 脚本,但出现错误,说“charmap”无法解码一个字节,因为字符映射到未定义。 我想这与unicode有关,但是我没有解决问题的经验。

def load_imdb_sentiment_analysis_dataset(data_path = 
"C:/Users/name/Desktop", seed=123):

imdb_data_path = os.path.join(data_path, 'aclImdb')

# Load the training data
train_texts = []
train_labels = []
for category in ['pos', 'neg']:
    train_path = os.path.join(imdb_data_path, 'train', category)
    for fname in sorted(os.listdir(train_path)):
        if fname.endswith('.txt'):
            with open(os.path.join(train_path, fname)) as f:
                train_texts.append(f.read())
            train_labels.append(0 if category == 'neg' else 1)

# Load the validation data.
test_texts = []
test_labels = []
for category in ['pos', 'neg']:
    test_path = os.path.join(imdb_data_path, 'test', category)
    for fname in sorted(os.listdir(test_path)):
        if fname.endswith('.txt'):
            with open(os.path.join(test_path, fname)) as f:
                test_texts.append(f.read())
            test_labels.append(0 if category == 'neg' else 1)

# Shuffle the training data and labels.
random.seed(seed)
random.shuffle(train_texts)
random.seed(seed)
random.shuffle(train_labels)

return ((train_texts, np.array(train_labels)),
        (test_texts, np.array(test_labels)))

我收到以下错误:UnicodeDecodeError: 'charmap' codec can't decode byte 0xaa in position 489: character maps to

您需要弄清楚您尝试打开的文件的编码。 并在 open 函数中使用它。

例如对于 utf8: open(filename, encoding='utf8')

所以你可以从with open(os.path.join(train_path, fname))更改为with open(os.path.join(train_path, fname), encoding='utf8')

如果您不关心无法打开的字符,则可以跳过它们(在这种方法中要小心): op​​en(filename, errors='ignore')

with open(os.path.join(train_path, fname), errors='ignore')

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM