简体   繁体   English

在Python中分割字串时,值解包的怪异行为

[英]Weird behavior of Value unpacking when splitting string in Python

Use case: I've got a long string that is divided by linebreaks, and each line has two element separated by a comma. 用例:我有一个很长的字符串,该字符串被换行符分隔,并且每行都有两个用逗号分隔的元素。

Ideally, this should work 理想情况下,这应该可行

[(x, y) for line in lines.split() for x, y in line.split(',')]

But it doesn't, and yields a ValueError the same as below. 但事实并非如此,并产生与下面相同的ValueError。 So I tried to decompose the problem to figure out what's going on here 所以我试图分解问题以弄清楚这里发生了什么

lines = \
"""a,b
c,d
e,f
g,h"""

lines = [line for line in lines.split()]

print(lines)
print(len(lines))
print([len(line) for line in lines])
print(all(',' in line for line in lines))

[(x, y) for l in lines for x,y in l.split(',')]

Yields: 产量:

/usr/bin/python3m /home/alex/PycharmProjects/test.py
['a,b', 'c,d', 'e,f', 'g,h']
4
[3, 3, 3, 3]
True

Traceback (most recent call last):
File "/home/alex/PycharmProjects/test.py", line 74, in <module>
...
File "/home/alex/PycharmProjects/test.py", line 63, in <listcomp>
[(x, y) for l in sines for x,y in l.split(',')]
ValueError: need more than 1 value to unpack

Yet if I replace the list comprehension in the last line with a classic for loop: 但是,如果我用经典的for循环替换最后一行中的列表理解:

for line in lines:
x, y = line.split(',')

It executes successfully: 它成功执行:

['a,b', 'c,d', 'e,f', 'g,h']
4
True
[3, 3, 3, 3]
a b
c d
e f
g h

This is driving me absolutely insane. 这真让我发疯。 If I further decompose it, I find that list, set, and generator comprehensions shit themselves trying to do this: 如果我进一步分解它,我会发现列表,集合和生成器理解本身会尝试这样做:

[(x,y) for x, y in "a,b".split(",")]

Anyone have an idea about why this occurs? 任何人都知道为什么会这样吗?

This code: 这段代码:

for x, y in "a,b".split(",")

is looking for two-item iterables that are inside the iterable (list) returned by "a,b".split(",") . 正在寻找"a,b".split(",")返回的可迭代(列表) 内部两个项目可迭代。

However, all it finds is 'a' and 'b' : 但是,它找到的只是'a''b'

>>> "a,b".split(",")
['a', 'b']
>>>

Since both of these are only one-item iterables (strings with one character), the code breaks. 由于这两个都是单项可迭代 (带有一个字符的字符串),因此代码会中断。


Considering the above, watch what happens when an extra character is added to each side of the comma: 考虑到上述情况,请注意将多余的字符添加到逗号的两侧时会发生什么:

>>> "ax,by".split(",")
['ax', 'by']
>>> [(x,y) for x, y in "ax,by".split(",")]
[('a', 'x'), ('b', 'y')]
>>>

As you can see, the code now works. 如您所见,该代码现在可以工作了。

This is because "ax,by".split(",") returns an iterable (list) that contains two-item iterables (strings with two characters). 这是因为"ax,by".split(",")返回一个包含两个项目可迭代项(带有两个字符的字符串"ax,by".split(",")的可迭代项(列表)。 Furthermore, this is exactly what for x, y in is looking for. 此外,这正是for x, y in要查找的。


However, you could also place the last part in a tuple: 但是,您也可以将最后一部分放在元组中:

>>> ("a,b".split(","),)
(['a', 'b'],)
>>> [(x,y) for x, y in ("a,b".split(","),)]
[('a', 'b')]
>>>

("a,b".split(","),) returns an iterable (tuple) that contains two-item iterables (a list with two strings). ("a,b".split(","),)返回包含两个项目可迭代项(包含两个字符串的列表("a,b".split(","),)的可迭代(元组)。 Once again, this is exactly what for x, y in is looking for, so the code works. 再一次,这正是for x, y in要查找的内容,因此代码可以正常工作。


With all this in mind, the below should fix your problem: 考虑到所有这些,以下内容将解决您的问题:

[(x, y) for line in lines.split() for x, y in (line.split(','),)]

why not this? 为什么不呢?

[tuple(l.split(',')) for l in lines ]

l.split(',') produces only two items for each l , not an iterable of two items per each l l.split(',')只产生两个项目对于每个l ,而不是为每次两个项目迭代l

Why not just: 为什么不只是:

lines = """
a,b
c,d
e,f
g,h
"""

lines = [line for line in lines.split()]

print(lines)
print(len(lines))
print([len(line) for line in lines])
print(all(',' in line for line in lines))

[l.split(",") for l in lines]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM