如何编码python列表

Question

我在尝试编码python列表时遇到了困难，我已经使用文本文件对它进行了编码，以便使用re模块对其中的特定单词进行计数。

这是代码：

# encoding text file
with codecs.open('projectsinline.txt', 'r', encoding="utf-8") as f:
    for line in f:
        # Using re module to extract specific words
        unicode_pattern = re.compile(r'\b\w{4,20}\b', re.UNICODE)
        result = unicode_pattern.findall(line)
    word_counts = Counter(result) # It creates a dictionary key and wordCount
    Allwords = []
    for clave in word_counts:
        if word_counts[clave] >= 10: # We look for the most repeated words
            word = clave
            Allwords.append(word)
    print Allwords

部分输出如下所示：

[...u'recursos', u'Partidos', u'Constituci\xf3n', u'veh\xedculos', u'investigaci\xf3n', u'Pol\xedticos']

如果我print可变word ，输出看起来应该是应该的。 但是，当我使用append ，所有单词都再次中断，如之前的示例。

我用这个例子：

[x.encode("utf-8") for x in Allwords]

输出看起来与以前完全相同。

我也用这个例子：

Allwords.append(str(word.encode("utf-8")))

输出发生了变化，但单词看起来不像它们应该的样子：

[...'recursos', 'Partidos', 'Constituci\xc3\xb3n', 'veh\xc3\xadculos', 'investigaci\xc3\xb3n', 'Pol\xc3\xadticos']

一些答案给出了这个例子：

print('[' + ', '.join(Allwords) + ']')

输出看起来像这样：

[...recursos, Partidos, ConstituciÃ³n, vehÃculos, investigaciÃ³n, PolÃticos]

老实说，我不想打印列表，只需对其进行编码，以便识别所有项目（单词）。

我正在寻找这样的东西：

[...'recursos', 'Partidos', 'Constitución', 'vehículos', 'investigación', 'Políticos']

任何解决问题的建议表示赞赏

谢谢，

Answer 1

你可能会尝试

print（'['+'，'.join（Allwords）+']'）

Answer 2

您的Unicode字符串列表正确。 当您打印列表时，列表中的项目将显示为其repr()函数。 当您自己打印项目时，项目将显示为其str()函数。 它只是一个显示选项，类似于将整数打印为十进制或十六进制。

因此，如果您想正确查看单个单词，请打印出来，但是为了进行比较，内容是正确的。

值得注意的是，Python 3更改了repr()的行为，如果终端直接支持非ASCII字符并且ascii()函数再现了Python 2的repr()行为，则现在将显示不带转义码的非ASCII字符。

如何编码python列表

问题描述

2 个解决方案

解决方案1
0 2016-03-03 06:51:38

解决方案2
0 2016-03-03 15:05:34

如何编码python列表

问题描述

2 个解决方案

解决方案1 0 2016-03-03 06:51:38

解决方案2 0 2016-03-03 15:05:34

解决方案1
0 2016-03-03 06:51:38

解决方案2
0 2016-03-03 15:05:34