[英]python re - split a string before a character
how to split a string at positions before a character? 如何在字符前的位置拆分字符串?
the obvious way doesn't work: 显而易见的方法不起作用:
>>> h=re.compile("(?=a)")
>>> h.split("fffagggahhh")
['fffagggahhh']
>>>
Ok, not exactly the solution you want but I thought it will be a useful addition to problem here. 好吧,不完全是你想要的解决方案,但我认为这将是一个有用的问题补充。
Solution without re 没有重新解决方案
Without re: 没有重新:
>>> x = "fffagggahhh"
>>> k = x.split('a')
>>> j = [k[0]] + ['a'+l for l in k[1:]]
>>> j
['fff', 'aggg', 'ahhh']
>>>
>>> rx = re.compile("(?:a|^)[^a]*")
>>> rx.findall("fffagggahhh")
['fff', 'aggg', 'ahhh']
>>> rx.findall("aaa")
['a', 'a', 'a']
>>> rx.findall("fgh")
['fgh']
>>> rx.findall("")
['']
>>> r=re.compile("(a?[^a]+)")
>>> r.findall("fffagggahhh")
['fff', 'aggg', 'ahhh']
EDIT: 编辑:
This won't handle correctly double a
s in the string: 这不会正确处理翻一番a
字符串中S:
>>> r.findall("fffagggaahhh")
['fff', 'aggg', 'ahhh']
KennyTM's re seems better suited. KennyTM似乎更适合。
import re
def split_before(pattern,text):
prev = 0
for m in re.finditer(pattern,text):
yield text[prev:m.start()]
prev = m.start()
yield text[prev:]
if __name__ == '__main__':
print list(split_before("a","fffagggahhh"))
re.split treats the pattern as a delimiter. re.split将模式视为分隔符。
>>> print list(split_before("a","afffagggahhhaab"))
['', 'afff', 'aggg', 'ahhh', 'a', 'ab']
>>> print list(split_before("a","ffaabcaaa"))
['ff', 'a', 'abc', 'a', 'a', 'a']
>>> print list(split_before("a","aaaaa"))
['', 'a', 'a', 'a', 'a', 'a']
>>> print list(split_before("a","bbbb"))
['bbbb']
>>> print list(split_before("a",""))
['']
This one works on repeated a
's 这个是重复的a
>>> re.findall("a[^a]*|^[^a]*", "aaaaa")
['a', 'a', 'a', 'a', 'a']
>>> re.findall("a[^a]*|[^a]+", "ffaabcaaa")
['ff', 'a', 'abc', 'a', 'a', 'a']
Approach: the main chunks that you are looking for are an a
followed by zero or more not- a
. 方法:您正在寻找的主要块是a
后跟零或更多不是a
。 That covers all possibilities except for zero or more not- a
. 覆盖除零个或多个不可─所有的可能性a
。 That can happen only at the start of the input string. 这只能在输入字符串的开头发生。
>>> foo = "abbcaaaabbbbcaaab"
>>> bar = foo.split("c")
>>> baz = [bar[0]] + ["c"+x for x in bar[1:]]
>>> baz
['abb', 'caaaabbbb', 'caaab']
Due to how slicing works, this will work properly even if there are no occurrences of c
in foo
. 由于切片如何工作,即使foo
中没有出现c
,这也能正常工作。
split()
takes an argument for the character to split on: split()
接受要拆分的字符的参数:
>>> "fffagggahhh".split('a')
['fff', 'ggg', 'hhh']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.