Unicodedata.normalize : TypeError: normalize() 参数 2 必须是 str，而不是列表

Question

I am trying to load a file in python.我正在尝试在 python 中加载一个文件。 If you run the below code and load a file that contains only English words, it will load just fine.如果你运行下面的代码并加载一个只包含英语单词的文件，它会加载得很好。

Listado.txt is a spanish language file that contains the following words : abacá, abadí, abadía, abajeño, abaniquería Listado.txt 是一个西班牙语文件，包含以下单词：abacá, abadí, abadía, abajeño, abaniquería

Spanish Language contains accented letters (eg é) or special characters ( diacritics ), and here is where the problem lies, when I try to load this file into Python it complains.西班牙语包含重音字母（例如 é）或特殊字符（变音符号），这就是问题所在，当我尝试将此文件加载到 Python 时，它会抱怨。 I would like to be able to normalize the list, or remove accented characters and load the list.我希望能够规范化列表，或删除重音字符并加载列表。

I have tried normalizing using :我尝试使用以下方法进行标准化：

unicodedata.normalize('NFD', line).encode('ascii', 'ignore')

and I get the below error :我收到以下错误：

TypeError: normalize() argument 2 must be str, not list类型错误：normalize() 参数 2 必须是 str，而不是列表

Code so far :到目前为止的代码：

import random
import string
import unicodedata

#WORDLIST_FILENAME = "words_alpha.txt"
WORDLIST_FILENAME = "listado.txt"

def loadWords():
    print("Loading word list from file...")
    # inFile: file
    inFile = open(WORDLIST_FILENAME, 'r')
    wordlist =[]
    for line in inFile:
        line = line.split()
        wordlist.extend(line)
#        unicodedata.normalize('NFD', line).encode('ascii', 'ignore')
        print(" "), len(wordlist), ("words loaded.")

    return wordlist

Answer 1

As the error says, you are trying to normalize line , which is a list, as you've done line = line.split() earlier.正如错误所说，您正在尝试规范化line ，它是一个列表，就像您之前完成的line = line.split()一样。 Just normalize the line before you split it into words, as follows:只需在将其拆分为单词之前将其标准化，如下所示：

for line in inFile:
    unicodedata.normalize('NFD', line).encode('ascii', 'ignore')
    line = line.split()
    wordlist.extend(line)
    print(" "), len(wordlist), ("words loaded.")

Alternatively, if you want to extend your wordlist before normalizing, you can do the following using a list comprehension to normalize each word individually.或者，如果您想在规范化之前扩展您的词表，您可以使用列表理解来单独规范化每个单词。

for line in inFile:
    line = line.split()
    wordlist.extend(line)
    line = [unicodedata.normalize('NFD', x).encode('ascii', 'ignore') for x in line]
    print(" "), len(wordlist), ("words loaded.")

Unicodedata.normalize : TypeError: normalize() 参数 2 必须是 str，而不是列表

问题描述

1 个解决方案

解决方案1
2 2019-12-03 16:04:21

Unicodedata.normalize : TypeError: normalize() 参数 2 必须是 str，而不是列表

问题描述

1 个解决方案

解决方案1 2 2019-12-03 16:04:21

解决方案1
2 2019-12-03 16:04:21