简体   繁体   中英

Python: Parsing a complex string into usable data for analysis

Is it possible to parse the below string, using Python, or convert it into another data structure so that each element can be accessed for analysis?

This is an example line from a large text file where each line has the same format.

string = ["('a', '1')", "('b', '2')"]

If you simply want to convert the tuple-strings to tuples, you can use ast.literal_eval :

>>> import ast
>>> [ast.literal_eval(x) for x in string]
[('a', '1'), ('b', '2')]

Use of ast.literal_eval rather than eval is encouraged as it is considered safer: it does not execute all strings of Python code, only literal expressions (no variables, no function calls).

You can then access the elements of the tuples using Python's slice/index notation, or convert to an alternative data structure, eg a dictionary:

>>> dict([ast.literal_eval(x) for x in string])
{'a': '1', 'b': '2'}

You can use ast.literal_eval within a list comprehension :

>>> s= ["('a', '1')", "('b', '2')"]
>>> import ast
>>> [ast.literal_eval(i) for i in s]
[('a', '1'), ('b', '2')]

ast.literal_eval(node_or_string)

Safely evaluate an expression node or a Unicode or Latin-1 encoded string containing a Python literal or container display.

Also as your elements are string you can use regex to parse your list elemets :

>>> import re
>>> a=[re.findall(r"'(.*?)'",i) for i in s]
>>> a
[['a', '1'], ['b', '2']]

The pattern '(.*?)' will matched anything between 2 one quote!

[eval(s) for s in string]

This will convert your list of strings to a list of tuples


Edit: The other answers suggested using literal_eval instead of eval. If your data is coming from an untrusted source you'll want to do that. literal_eval doc link

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM