简体   繁体   中英

How to encode a python list

I'm having a hard time trying to encode a python list, I already did it with a text file in order to count specific words inside it, using re module.

This is the code:

# encoding text file
with codecs.open('projectsinline.txt', 'r', encoding="utf-8") as f:
    for line in f:
        # Using re module to extract specific words
        unicode_pattern = re.compile(r'\b\w{4,20}\b', re.UNICODE)
        result = unicode_pattern.findall(line)
    word_counts = Counter(result) # It creates a dictionary key and wordCount
    Allwords = []
    for clave in word_counts:
        if word_counts[clave] >= 10: # We look for the most repeated words
            word = clave
            Allwords.append(word)
    print Allwords

Part of the output looks like this:

[...u'recursos', u'Partidos', u'Constituci\xf3n', u'veh\xedculos', u'investigaci\xf3n', u'Pol\xedticos']

If I print variable word the output looks as it should be. However, when I use append , all the words breaks again, as the example before.

I use this example:

[x.encode("utf-8") for x in Allwords]

The output looks exactly the same as before.

I also use this example:

Allwords.append(str(word.encode("utf-8")))

The output change, but the words don't look as they should be:

[...'recursos', 'Partidos', 'Constituci\xc3\xb3n', 'veh\xc3\xadculos', 'investigaci\xc3\xb3n', 'Pol\xc3\xadticos']

Some of the answers have given this example:

print('[' + ', '.join(Allwords) + ']')

The output looks like this:

[...recursos, Partidos, Constitución, vehículos, investigación, Políticos]

To be honest I do not want to print the list, just encode it, so that all items (words) are recognized.

I'm looking for something like this:

[...'recursos', 'Partidos', 'Constitución', 'vehículos', 'investigación', 'Políticos']

Any suggestions to solve the problem are appreciated

Thanks,

you might what to try

print('[' + ', '.join(Allwords) + ']')

Your Unicode string list is correct. When you print lists the items in the list display as their repr() function. When you print the items themselves, the items display as their str() function. It is only a display option, similar to printing integers as decimal or hexadecimal.

So print the individual words if you want to see them correctly, but for comparisons the content is correct.

It's worth noting that Python 3 changes the behavior of repr() and now will display non-ASCII characters without escape codes if the terminal supports them directly and the ascii() function reproduces the Python 2 repr() behavior.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM