简体   繁体   English

python拆分而不创建空格

[英]python split without creating blanks

I understand why it is important to create blanks using split thanks to this question , but sometimes it is necessary not to grab them. 我知道为什么要使用split创建空白很重要,这要归功于这个问题 ,但有时有必要不要抓住它们。

lets say you parsed some css and got the following strings: 假设您解析了一些CSS,并获得了以下字符串:

s1 = 'background-color:#000;color:#fff;border:1px #ccc dotted;'
s2 = 'color:#000;background-color:#fff;border:1px #333 dotted'

both are valid css even though there is a semicolon lacking at the end of the string. 即使字符串的末尾缺少分号,两者都有效。 when splitting the strings, you get the following: 拆分字符串时,将获得以下信息:

>>> s1.split(';')
['background-color:#000', 'color:#fff', 'border:1px #ccc dotted', '']
>>> s2.split(';')
['color:#000', 'background-color:#fff', 'border:1px #333 dotted']

that extra semicolon creates a blank item in the list. 多余的分号将在列表中创建一个空白项目。 now if I want to manipulate further I would need to test the beginning and end of each list, and remove them if they are blank, which is not that bad, but seems avoidable. 现在,如果我想进一步操作,则需要测试每个列表的开头和结尾,如果它们为空,则将其删除,这还不错,但可以避免。

question: 题:

is there a method that is essentially the same as split but does not include trailing blank items? 是否有一种与split基本相同但不包含尾随空白项的方法? or is there simply a way to remove those just like a string has strip to remove the trailing whitespace 还是有一种简单的方法来删除这些字符,就像字符串具有strip带来删除尾随空格一样

Simply remove the items with the None filter: 只需使用“ None过滤器删除项目:

filter(None, s1.split(';'))

Demo: 演示:

>>> s1 = 'background-color:#000;color:#fff;border:1px #ccc dotted;'
>>> filter(None, s1.split(';'))
['background-color:#000', 'color:#fff', 'border:1px #ccc dotted']

Calling filter() with None removes all 'empty' or numeric 0 items; 使用None调用filter()会删除所有“空”或数字0项目; anything that would evaluate to false in a boolean context. 在布尔上下文中会评估为false的任何内容。

filter(None, ....) eats list comprehensions for breakfast: filter(None, ....)吃早餐的清单理解:

>>> import timeit
>>> timeit.timeit('filter(None, a)', "a = [1, 2, 3, None, 4, 'five', ''] * 100")
9.410392045974731
>>> timeit.timeit('[i for i in a if i]', "a = [1, 2, 3, None, 4, 'five', ''] * 100")
44.9318630695343

You can use a list comprehension to filter out the empty strings, as an empty string is considered False : 您可以使用列表推导过滤掉空字符串,因为空字符串被视为False

>>> s1 = 'background-color:#000;color:#fff;border:1px #ccc dotted;'
>>> [i for i in s1.split(';') if i]
['background-color:#000', 'color:#fff', 'border:1px #ccc dotted']

Alternatively, you can rstrip() the semicolon first: 或者,您可以rstrip()分号:

>>> s1.rstrip(';').split(';')
['background-color:#000', 'color:#fff', 'border:1px #ccc dotted']

Apply str.strip to the string before doing the split : 在执行split之前,将str.strip应用于字符串:

>>> s1 = 'background-color:#000;color:#fff;border:1px #ccc dotted;'
...     
>>> s1.strip(';').split(';')
['background-color:#000', 'color:#fff', 'border:1px #ccc dotted']

Works for both leading and trailing ';' 适用于开头和结尾的';' :

>>> s1 = ';background-color:#000;color:#fff;border:1px #ccc dotted;'
>>> s1.strip(';').split(';')
['background-color:#000', 'color:#fff', 'border:1px #ccc dotted']

I am not sure why you would want to avoid this as a strip before split is going to be faster than both LC and filter : 我不确定为什么要在拆分之前比LCfilter更快的情况下避免出现这种情况:

>>> s1 = ';background-color:#000;color:#fff;border:1px #ccc dotted;'*1000
>>> %timeit filter(None, s1.split(';'))
1000 loops, best of 3: 638 us per loop
>>> %timeit s1.strip(';').split(';')
1000 loops, best of 3: 570 us per loop
>>> %timeit [i for i in s1.split(';') if i]
100 loops, best of 3: 931 us per loop

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM