简体   繁体   English

如何编码python列表

[英]How to encode a python list

I'm having a hard time trying to encode a python list, I already did it with a text file in order to count specific words inside it, using re module. 我在尝试编码python列表时遇到了困难,我已经使用文本文件对它进行了编码,以便使用re模块对其中的特定单词进行计数。

This is the code: 这是代码:

# encoding text file
with codecs.open('projectsinline.txt', 'r', encoding="utf-8") as f:
    for line in f:
        # Using re module to extract specific words
        unicode_pattern = re.compile(r'\b\w{4,20}\b', re.UNICODE)
        result = unicode_pattern.findall(line)
    word_counts = Counter(result) # It creates a dictionary key and wordCount
    Allwords = []
    for clave in word_counts:
        if word_counts[clave] >= 10: # We look for the most repeated words
            word = clave
            Allwords.append(word)
    print Allwords

Part of the output looks like this: 部分输出如下所示:

[...u'recursos', u'Partidos', u'Constituci\xf3n', u'veh\xedculos', u'investigaci\xf3n', u'Pol\xedticos']

If I print variable word the output looks as it should be. 如果我print可变word ,输出看起来应该是应该的。 However, when I use append , all the words breaks again, as the example before. 但是,当我使用append ,所有单词都再次中断,如之前的示例。

I use this example: 我用这个例子:

[x.encode("utf-8") for x in Allwords]

The output looks exactly the same as before. 输出看起来与以前完全相同。

I also use this example: 我也用这个例子:

Allwords.append(str(word.encode("utf-8")))

The output change, but the words don't look as they should be: 输出发生了变化,但单词看起来不像它们应该的样子:

[...'recursos', 'Partidos', 'Constituci\xc3\xb3n', 'veh\xc3\xadculos', 'investigaci\xc3\xb3n', 'Pol\xc3\xadticos']

Some of the answers have given this example: 一些答案给出了这个例子:

print('[' + ', '.join(Allwords) + ']')

The output looks like this: 输出看起来像这样:

[...recursos, Partidos, Constitución, vehículos, investigación, Políticos]

To be honest I do not want to print the list, just encode it, so that all items (words) are recognized. 老实说,我不想打印列表,只需对其进行编码,以便识别所有项目(单词)。

I'm looking for something like this: 我正在寻找这样的东西:

[...'recursos', 'Partidos', 'Constitución', 'vehículos', 'investigación', 'Políticos']

Any suggestions to solve the problem are appreciated 任何解决问题的建议表示赞赏

Thanks, 谢谢,

you might what to try 你可能会尝试

print('[' + ', '.join(Allwords) + ']') print('['+','.join(Allwords)+']')

Your Unicode string list is correct. 您的Unicode字符串列表正确。 When you print lists the items in the list display as their repr() function. 当您打印列表时,列表中的项目将显示为其repr()函数。 When you print the items themselves, the items display as their str() function. 当您自己打印项目时,项目将显示为其str()函数。 It is only a display option, similar to printing integers as decimal or hexadecimal. 它只是一个显示选项,类似于将整数打印为十进制或十六进制。

So print the individual words if you want to see them correctly, but for comparisons the content is correct. 因此,如果您想正确查看单个单词,请打印出来,但是为了进行比较,内容是正确的。

It's worth noting that Python 3 changes the behavior of repr() and now will display non-ASCII characters without escape codes if the terminal supports them directly and the ascii() function reproduces the Python 2 repr() behavior. 值得注意的是,Python 3更改了repr()的行为,如果终端直接支持非ASCII字符并且ascii()函数再现了Python 2的repr()行为,则现在将显示不带转义码的非ASCII字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM