简体   繁体   中英

Unicode to dictionary (unicode contains apostrophe punctuation)

I have read the following Unicode from a CSV file:

line = u"{u'There's Still Time': u'foo'}"

I would like to be able to convert this to a dictionary so I would be able to so I can access it as the following:

line["There's Still Time"] 
Output: 'foo'

Please help.

Given that there is an apostrophe within the string, you'll have to do some pre-processing before you even attempt to parse it into a dict . Assuming that all strings within the target dict are unicode and that closing strings have to be followed immediately by a control character (ie } , : , , , } , whitespace...) you can search for all apostrophes that do not match these two categories and escape them. Then you can use ast.literal_eval() to parse it into a dict , something like:

import ast
import re

APOSTROPHE_ESCAPE = re.compile(r"(?<!u)'(?![.}:,\s])")

line = u"{u'There's Still Time': u'foo'}"
your_dict = ast.literal_eval(APOSTROPHE_ESCAPE.sub(r"\'", line))

print(your_dict)  # {u"There's Still Time": u'foo'}

Keep in mind, tho, that just a simple:

line = u"{u'There'}s Still Time': u'foo'}"

Will throw it off - sure, it would be an illegal dictionary in the source as well, but keep in mind these limitations and adjust your pre-process regex accordingly.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM