简体   繁体   中英

python string including double quote character

I have input strings that are comprised of characters, including double and single quotes " and '

B@SS$*JU(PQ
AD&^%$^@!$
%()%@@DDSFD"*")(#
ABD*E@(%J^&@

however, when I open the above input from a text file and just print it, the double quotes " in the third line get printed as \\xe2\\x80\\x9d

I am aiming to do a simple character count:

B 2
@ 3
S 2
$ 3
etc.

so I want to be able to output

" 3

in the above list. Should I replace the double quotes with something so I can count them and print out the count?

Thanks a lot.

\\xe2\\x80\\x9d

Is a unicode value for "special" double quotes. You could decode from UTF-8 into Unicode to convert this into a "single" Unicode character.

>>> print "\xe2\x80\x9d".decode("utf-8")
”
>>> len("\xe2\x80\x9d".decode("utf-8"))
1

If you are using Python 3:

>>> print(b"\xe2\x80\x9d".decode('utf8'))
”
>>> len(b"\xe2\x80\x9d".decode("utf-8"))
1

So for your file that you are counting (in Python 2):

from collections import defaultdict
with open("filename", 'r') as f:
    for text in f:
        decoded = text.decode("utf-8")
        count = defaultdict(int)
        for i in decoded:
            count[i] += 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM