[英]Split a string around characters in python
Hey guys I have seen on here answers to how to split strings at a specified character and that's pretty simple. 大家好,我在这里看到了如何在指定字符处分割字符串的答案,这很简单。 What I need to know is how to split strings between 2 characters
我需要知道的是如何在2个字符之间分割字符串
ie splitting for character strings beginning with M and ending in Z RERTCRPVNMVRNSRRTNSKSRSRHRZGRCRCGRHWVRNFDNPFISRYRRSZTSFFIFTVKFLSSYGLKKRKIKRTTVKVQGSTIMSLLNTZLN 即,对以M开头并以Z结尾的字符串进行拆分RERTCRPVNMVRNSRRTNSKSRSRHRZGRCRCGRHWVRNFDNPFISRYRRSZTSFFIFTVKFLSSYGLKKRKIKIKRTTVKVTVGQIMSSTNTLLNTZLN
into RERTCRPVN MVRNSRRTNSKSRSRHRZ GRCRCGRHWVRNFDNPFISRYRRSZTSFFIFTVKFLSSYGLKKRKIKRTTVKVQGSTI MSLLNTZ LN 进入RERTCRPVN MVRNSRRTNSKSRSRHRZ GRCRCGRHWVRNFDNPFISRYRRSZTSFFIFTVKFLSSYGLKKRKIKRTTVKVQGSTI MSLLNTZ LN
and later only keeping those desired. 后来只保留那些想要的东西。
I might be able to form some kind of weird loop to do this like 我可能可以形成某种怪异的循环来做到这一点
NET=Aminos.split('M')
LIST=[]
rock= int(0)
while LIST[rock]!= 'M' and LIST[rock]!= '':
LIST.append('M' + NET[rock])
rock=rock + 1
other=other+1
print(LIST)
but in this given example I get the index out of range error. 但是在这个给定的例子中,我得到索引超出范围错误。
This sort of thing seems rather tedious as well because I would have to break LIST apart after each Z with a split and try to concatenate 'Z' to the end of each. 这种事情似乎也很乏味,因为我必须在每个Z之后用分割符将LIST分开,然后尝试将'Z'连接到每个Z的末尾。
Does anyone know of a way of doing this more efficiently? 有谁知道更有效地做到这一点的方法?
You can use regular expressions to extract all strings beginning with M and ending with Z from a string: 您可以使用正则表达式从字符串中提取所有以M开头和Z结束的字符串:
>>> re.findall('M.*?Z', "RERTCRPVNMVRNSRRTNSKSRSRHRZGRCRCGRHWVRNFDNPFISRYRRSZTSFFIFTVKFLSSYGLKKRKIKRTTVKVQGSTIMSLLNTZLN")
['MVRNSRRTNSKSRSRHRZ', 'MSLLNTZ']
Or, if you want to keep the strings in between as well: 或者,如果您也想在字符串之间保持:
>>> re.split('(M.*?Z)', "RERTCRPVNMVRNSRRTNSKSRSRHRZGRCRCGRHWVRNFDNPFISRYRRSZTSFFIFTVKFLSSYGLKKRKIKRTTVKVQGSTIMSLLNTZLN")
['RERTCRPVN', 'MVRNSRRTNSKSRSRHRZ', 'GRCRCGRHWVRNFDNPFISRYRRSZTSFFIFTVKFLSSYGLKKRKIKRTTVKVQGSTI', 'MSLLNTZ', 'LN']
This sounds like something for re.split
: 这听起来像是
re.split
东西:
ie: 即:
import re
ex = re.compile("M.*Z")
splitted = re.split(ex, <some input string>)
Edit: Updated per Tim Heap, as I misinterpreted "beginning with M and ending in Z" as occurring at word boundaries. 编辑:根据Tim Heap更新,因为我误认为“以M开头并以Z结尾”出现在单词边界。
Edit 2: After @Cairnarvon's feedback, here is an example that works (surprisingly, re.split
doesn't accept the result of re.compile
, you must pass in the regex as a string: 编辑2:在@Cairnarvon的反馈之后,这是一个有效的示例(令人惊讶的是,
re.split
不接受re.compile
的结果,您必须将正则表达式作为字符串传递:
>>> s = "RERTCRPVNMVRNSRRTNSKSRSRHRZGRCRCGRHWVRNFDNPFISRYRRSZTSFFIFTVKFLSSYGLKKRKIKRTTVKVQGSTIMSLLNTZLN"
>>> ex = re.compile("(M.*?Z)")
>>> re.split(s, ex)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/re.py", line 167, in split
return _compile(pattern, flags).split(string, maxsplit)
TypeError: expected string or buffer
>>> re.split("M.*Z", s)
['RERTCRPVN', 'LN']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.