简体   繁体   English

如何将列表的字符串表示形式转换为列表

[英]How to convert string representation of list to a list

I was wondering what the simplest way is to convert a string representation of a list like the following to a list :我想知道最简单的方法是将如下列表的字符串表示形式转换为list

x = '[ "A","B","C" , " D"]'

Even in cases where the user puts spaces in between the commas, and spaces inside of the quotes, I need to handle that as well and convert it to:即使在用户在逗号之间放置空格和引号内放置空格的情况下,我也需要处理它并将其转换为:

x = ["A", "B", "C", "D"] 

I know I can strip spaces with strip() and split() and check for non-letter characters.我知道我可以用strip()split()去除空格并检查非字母字符。 But the code was getting very kludgy.但是代码变得非常笨拙。 Is there a quick function that I'm not aware of?有我不知道的快速 function 吗?

>>> import ast
>>> x = '[ "A","B","C" , " D"]'
>>> x = ast.literal_eval(x)
>>> x
['A', 'B', 'C', ' D']
>>> x = [n.strip() for n in x]
>>> x
['A', 'B', 'C', 'D']

ast.literal_eval : ast.literal_eval

With ast.literal_eval you can safely evaluate an expression node or a string containing a Python literal or container display.使用ast.literal_eval ,您可以安全地评估表达式节点或包含 Python 文字或容器显示的字符串。 The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, booleans, and None .提供的字符串或节点只能由以下 Python 文字结构组成:字符串、字节、数字、元组、列表、字典、布尔值和None

The json module is a better solution whenever there is a stringified list of dictionaries.每当有一个字符串化的字典列表时, json模块是一个更好的解决方案。 The json.loads(your_data) function can be used to convert it to a list. json.loads(your_data)函数可用于将其转换为列表。

>>> import json
>>> x = '[ "A","B","C" , " D"]'
>>> json.loads(x)
['A', 'B', 'C', ' D']

Similarly相似地

>>> x = '[ "A","B","C" , {"D":"E"}]'
>>> json.loads(x)
['A', 'B', 'C', {'D': 'E'}]

The eval is dangerous - you shouldn't execute user input. eval很危险——你不应该执行用户输入。

If you have 2.6 or newer, use ast instead of eval:如果您有 2.6 或更高版本,请使用 ast 而不是 eval:

>>> import ast
>>> ast.literal_eval('["A","B" ,"C" ," D"]')
["A", "B", "C", " D"]

Once you have that, strip the strings.一旦你有了它, strip琴弦。

If you're on an older version of Python, you can get very close to what you want with a simple regular expression:如果您使用的是旧版本的 Python,则可以使用简单的正则表达式非常接近您想要的:

>>> x='[  "A",  " B", "C","D "]'
>>> re.findall(r'"\s*([^"]*?)\s*"', x)
['A', 'B', 'C', 'D']

This isn't as good as the ast solution, for example it doesn't correctly handle escaped quotes in strings.这不如 ast 解决方案好,例如它不能正确处理字符串中的转义引号。 But it's simple, doesn't involve a dangerous eval, and might be good enough for your purpose if you're on an older Python without ast.但这很简单,不涉及危险的 eval,如果您使用的是没有 ast 的较旧的 Python,它可能足以满足您的目的。

There is a quick solution:有一个快速的解决方案:

x = eval('[ "A","B","C" , " D"]')

Unwanted whitespaces in the list elements may be removed in this way:可以通过以下方式删除列表元素中不需要的空格:

x = [x.strip() for x in eval('[ "A","B","C" , " D"]')]

Inspired from some of the answers above that work with base python packages I compared the performance of a few (using Python 3.7.3):受上述与基本 python 包一起使用的一些答案的启发,我比较了一些(使用 Python 3.7.3)的性能:

Method 1: ast方法一:ast

import ast
list(map(str.strip, ast.literal_eval(u'[ "A","B","C" , " D"]')))
# ['A', 'B', 'C', 'D']

import timeit
timeit.timeit(stmt="list(map(str.strip, ast.literal_eval(u'[ \"A\",\"B\",\"C\" , \" D\"]')))", setup='import ast', number=100000)
# 1.292875313000195

Method 2: json方法二:json

import json
list(map(str.strip, json.loads(u'[ "A","B","C" , " D"]')))
# ['A', 'B', 'C', 'D']

import timeit
timeit.timeit(stmt="list(map(str.strip, json.loads(u'[ \"A\",\"B\",\"C\" , \" D\"]')))", setup='import json', number=100000)
# 0.27833264000014424

Method 3: no import方法三:不导入

list(map(str.strip, u'[ "A","B","C" , " D"]'.strip('][').replace('"', '').split(',')))
# ['A', 'B', 'C', 'D']

import timeit
timeit.timeit(stmt="list(map(str.strip, u'[ \"A\",\"B\",\"C\" , \" D\"]'.strip('][').replace('\"', '').split(',')))", number=100000)
# 0.12935059100027502

I was disappointed to see what I considered the method with the worst readability was the method with the best performance... there are tradeoffs to consider when going with the most readable option... for the type of workloads I use python for I usually value readability over a slightly more performant option, but as usual it depends.我很失望地看到我认为可读性最差的方法是性能最好的方法......在使用最具可读性的选项时需要考虑权衡......对于我通常使用 python 的工作负载类型比性能稍高的选项更重视可读性,但像往常一样,这取决于。

import ast
l = ast.literal_eval('[ "A","B","C" , " D"]')
l = [i.strip() for i in l]

If it's only a one dimensional list, this can be done without importing anything:如果它只是一个一维列表,则无需导入任何内容即可完成:

>>> x = u'[ "A","B","C" , " D"]'
>>> ls = x.strip('[]').replace('"', '').replace(' ', '').split(',')
>>> ls
['A', 'B', 'C', 'D']

This u can do,这个你能做到

** **

x = '[ "A","B","C" , " D"]'
print(list(eval(x)))

** best one is the accepted answer ** 最好的一个是公认的答案

Though this is not a safe way, the best answer is the accepted one.尽管这不是一种安全的方法,但最好的答案是公认的。 wasn't aware of the eval danger when answer was posted.发布答案时不知道评估危险。

Assuming that all your inputs are lists and that the double quotes in the input actually don't matter, this can be done with a simple regexp replace.假设您的所有输入都是列表并且输入中的双引号实际上并不重要,这可以通过简单的正则表达式替换来完成。 It is a bit perl-y but works like a charm.它有点 perl-y,但就像一个魅力。 Note also that the output is now a list of unicode strings, you didn't specify that you needed that, but it seems to make sense given unicode input.另请注意,输出现在是一个 unicode 字符串列表,您没有指定您需要它,但考虑到 unicode 输入,这似乎是有意义的。

import re
x = u'[ "A","B","C" , " D"]'
junkers = re.compile('[[" \]]')
result = junkers.sub('', x).split(',')
print result
--->  [u'A', u'B', u'C', u'D']

The junkers variable contains a compiled regexp (for speed) of all characters we don't want, using ] as a character required some backslash trickery. junkers 变量包含我们不想要的所有字符的编译正则表达式(用于速度),使用 ] 作为字符需要一些反斜杠技巧。 The re.sub replaces all these characters with nothing, and we split the resulting string at the commas. re.sub 将所有这些字符都替换为空,我们在逗号处拆分结果字符串。

Note that this also removes spaces from inside entries u'["oh no"]' ---> [u'ohno'].请注意,这也会从条目 u'["oh no"]' ---> [u'ohno'] 中删除空格。 If this is not what you wanted, the regexp needs to be souped up a bit.如果这不是您想要的,则需要对正则表达式进行一些改进。

No need to import anything and no need evaluate.无需导入任何东西,也无需评估。 You can do this in one line for most basic use cases, including the one given in original question.对于大多数基本用例,包括原始问题中给出的用例,您可以在一行中执行此操作。

One liner一个班轮

l_x = [i.strip() for i in x[1:-1].replace('"',"").split(',')]

Explanation解释

x = '[ "A","B","C" , " D"]'
# str indexing to eliminate the brackets
# replace as split will otherwise retain the quotes in returned list
# split to conv to list
l_x = x[1:-1].replace('"',"").split(',')

Outputs :输出

for i in range(0, len(l_x)):
    print(l_x[i])
# vvvv output vvvvv
'''
 A
B
C 
  D
'''
print(type(l_x)) # out: class 'list'
print(len(l_x)) # out: 4

You can parse and clean up this list as needed using list comprehension.您可以根据需要使用列表推导解析和清理此列表。

l_x = [i.strip() for i in l_x] # list comprehension to clean up
for i in range(0, len(l_x)):
    print(l_x[i])
# vvvvv output vvvvv
'''
A
B
C
D
'''

Nested lists嵌套列表

If you have nested lists, it does get a bit more annoying.如果你有嵌套列表,它确实会更烦人。 Without using regex (which would simplify the replace), and assuming you want to return a flattened list (and the zen of python says flat is better than nested ):不使用正则表达式(这将简化替换),并假设您要返回一个扁平列表(并且python 的禅宗说 flat 比 nested 更好):

x = '[ "A","B","C" , " D", ["E","F","G"]]'
l_x = x[1:-1].split(',')
l_x = [i
    .replace(']', '')
    .replace('[', '')
    .replace('"', '')
    .strip() for i in l_x
]
# returns ['A', 'B', 'C', 'D', 'E', 'F', 'G']

If you need to retain the nested list it gets a bit uglier, but can still be done just with re and list comprehension:如果您需要保留嵌套列表,它会变得有点丑陋,但仍然可以通过 re 和列表理解来完成:

import re
x = '[ "A","B","C" , " D", "["E","F","G"]","Z", "Y", "["H","I","J"]", "K", "L"]'
# clean it up so regex is simpler
x = x.replace('"', '').replace(' ', '') 
# look ahead for the bracketed text that signifies nested list
l_x = re.split(r',(?=\[[A-Za-z0-9\',]+\])|(?<=\]),', x[1:-1])
print(l_x)
# flatten and split the non nested list items
l_x0 = [item for items in l_x for item in items.split(',') if not '[' in items]
# convert the nested lists to lists
l_x1 = [
    i[1:-1].split(',') for i in l_x if '[' in i 
]
# add the two lists 
l_x = l_x0 + l_x1

This last solution will work on any list stored as a string, nested or not.最后一个解决方案适用于任何存储为字符串的列表,无论是否嵌套。

If you know that your lists only contain quoted strings, this pyparsing example will give you your list of stripped strings (even preserving the original Unicode-ness).如果您知道您的列表仅包含带引号的字符串,则此 pyparsing 示例将为您提供已剥离字符串的列表(甚至保留原始的 Unicode-ness)。

>>> from pyparsing import *
>>> x =u'[ "A","B","C" , " D"]'
>>> LBR,RBR = map(Suppress,"[]")
>>> qs = quotedString.setParseAction(removeQuotes, lambda t: t[0].strip())
>>> qsList = LBR + delimitedList(qs) + RBR
>>> print qsList.parseString(x).asList()
[u'A', u'B', u'C', u'D']

If your lists can have more datatypes, or even contain lists within lists, then you will need a more complete grammar - like this one in the pyparsing examples directory, which will handle tuples, lists, ints, floats, and quoted strings.如果您的列表可以有更多的数据类型,甚至在列表中包含列表,那么您将需要一个更完整的语法 - 就像 pyparsing 示例目录中的这个语法,它将处理元组、列表、整数、浮点数和带引号的字符串。

with numpy this is working a very simple way numpy这是一个非常简单的方法

x = u'[ "A","B","C" , " D"]'
list_string = str(x)
import numpy as np
print np.array(list_string)

gives

>>> 
[ "A","B","C" , " D"]

To further complete @Ryan 's answer using json, one very convenient function to convert unicode is the one posted here: https://stackoverflow.com/a/13105359/7599285要使用 json 进一步完成@Ryan 的答案,转换 unicode 的一个非常方便的功能是此处发布的功能: https ://stackoverflow.com/a/13105359/7599285

ex with double or single quotes: ex 带双引号或单引号:

>print byteify(json.loads(u'[ "A","B","C" , " D"]')
>print byteify(json.loads(u"[ 'A','B','C' , ' D']".replace('\'','"')))
['A', 'B', 'C', ' D']
['A', 'B', 'C', ' D']

This usually happens when you load list stored as string to CSV当您将存储为字符串的列表加载到 CSV 时,通常会发生这种情况

If you have your list stored in CSV in form like OP asked:如果您将列表存储在 CSV 格式中,例如 OP 询问:

x = '[ "A","B","C" , " D"]'

Here is how you can load it back to list:以下是如何将其加载回列表:

import csv
with open('YourCSVFile.csv') as csv_file:
    reader = csv.reader(csv_file, delimiter=',')
    rows = list(reader)

listItems = rows[0]

listItems is now list listItems现在是列表

You may run into such problem while dealing with scraped data stored as Pandas DataFrame.在处理存储为 Pandas DataFrame 的抓取数据时,您可能会遇到此类问题。

This solution works like charm if the list of values is present as text .如果值列表以 text 形式存在,则此解决方案的作用就像魅力一样。

def textToList(hashtags):
    return hashtags.strip('[]').replace('\'', '').replace(' ', '').split(',')

hashtags = "[ 'A','B','C' , ' D']"
hashtags = textToList(hashtags)

Output: ['A', 'B', 'C', 'D']

No external library required.不需要外部库。

I would like to provide a more intuitive patterning solution with regex.我想用正则表达式提供更直观的模式解决方案。 The below function takes as input a stringified list containing arbitrary strings.下面的函数将包含任意字符串的字符串化列表作为输入。

Stepwise explanation: You remove all whitespacing,bracketing and value_separators (provided they are not part of the values you want to extract, else make the regex more complex).逐步解释:您删除所有空格、括号和 value_separators(前提是它们不是您要提取的值的一部分,否则会使正则表达式更复杂)。 Then you split the cleaned string on single or double quotes and take the non-empty values (or odd indexed values, whatever the preference).然后将清理后的字符串拆分为单引号或双引号,并取非空值(或奇数索引值,无论偏好如何)。

def parse_strlist(sl):
import re
clean = re.sub("[\[\],\s]","",sl)
splitted = re.split("[\'\"]",clean)
values_only = [s for s in splitted if s != '']
return values_only

testsample : "['21',"foo" '6', '0', " A"]" testsample : "['21',"foo" '6', '0', "A"]"

So, following all the answers I decided to time the most common methods:因此,根据所有答案,我决定对最常用的方法进行计时:

from time import time
import re
import json


my_str = str(list(range(19)))
print(my_str)

reps = 100000

start = time()
for i in range(0, reps):
    re.findall("\w+", my_str)
print("Regex method:\t", (time() - start) / reps)

start = time()
for i in range(0, reps):
    json.loads(my_str)
print("json method:\t", (time() - start) / reps)

start = time()
for i in range(0, reps):
    ast.literal_eval(my_str)
print("ast method:\t\t", (time() - start) / reps)

start = time()
for i in range(0, reps):
    [n.strip() for n in my_str]
print("strip method:\t", (time() - start) / reps)



    regex method:    6.391477584838867e-07
    json method:     2.535374164581299e-06
    ast method:      2.4425282478332518e-05
    strip method:    4.983267784118653e-06

So in the end regex wins!所以最终正则表达式获胜!

you can save yourself the .strip() fcn by just slicing off the first and last characters from the string representation of the list (see third line below)您可以通过从列表的字符串表示中切掉第一个和最后一个字符来保存 .strip() fcn(请参见下面的第三行)

>>> mylist=[1,2,3,4,5,'baloney','alfalfa']
>>> strlist=str(mylist)
['1', ' 2', ' 3', ' 4', ' 5', " 'baloney'", " 'alfalfa'"]
>>> mylistfromstring=(strlist[1:-1].split(', '))
>>> mylistfromstring[3]
'4'
>>> for entry in mylistfromstring:
...     print(entry)
...     type(entry)
... 
1
<class 'str'>
2
<class 'str'>
3
<class 'str'>
4
<class 'str'>
5
<class 'str'>
'baloney'
<class 'str'>
'alfalfa'
<class 'str'>

并使用纯 python - 不导入任何库

[x for x in  x.split('[')[1].split(']')[0].split('"')[1:-1] if x not in[',',' , ',', ']]

This solution is simpler than some I read above but requires to match all features of the list这个解决方案比我上面读到的更简单,但需要匹配列表的所有功能

x = '[ "A","B","C" , " D"]'
[i.strip() for i in x.split('"') if len(i.strip().strip(',').strip(']').strip('['))>0]

['A', 'B', 'C', 'D'] ['A B C D']

Let's assume your string is t_vector = [34, 54, 52, 23] and you want to convert this into a list. 假设您的字符串是t_vector = [34,54,52,23],并且您想将其转换为列表。 You can use the below 2 steps: 您可以使用以下2个步骤:

ls = t_vector.strip('][')
t_vector = ls.split(' ')

t_vector contains the list. t_vector包含列表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM