简体   繁体   English

在python中的字符周围拆分字符串

[英]Split a string around characters in python

Hey guys I have seen on here answers to how to split strings at a specified character and that's pretty simple. 大家好,我在这里看到了如何在指定字符处分割字符串的答案,这很简单。 What I need to know is how to split strings between 2 characters 我需要知道的是如何在2个字符之间分割字符串

ie splitting for character strings beginning with M and ending in Z RERTCRPVNMVRNSRRTNSKSRSRHRZGRCRCGRHWVRNFDNPFISRYRRSZTSFFIFTVKFLSSYGLKKRKIKRTTVKVQGSTIMSLLNTZLN 即,对以M开头并以Z结尾的字符串进行拆分RERTCRPVNMVRNSRRTNSKSRSRHRZGRCRCGRHWVRNFDNPFISRYRRSZTSFFIFTVKFLSSYGLKKRKIKIKRTTVKVTVGQIMSSTNTLLNTZLN

into RERTCRPVN MVRNSRRTNSKSRSRHRZ GRCRCGRHWVRNFDNPFISRYRRSZTSFFIFTVKFLSSYGLKKRKIKRTTVKVQGSTI MSLLNTZ LN 进入RERTCRPVN MVRNSRRTNSKSRSRHRZ GRCRCGRHWVRNFDNPFISRYRRSZTSFFIFTVKFLSSYGLKKRKIKRTTVKVQGSTI MSLLNTZ LN

and later only keeping those desired. 后来只保留那些想要的东西。

I might be able to form some kind of weird loop to do this like 我可能可以形成某种怪异的循环来做到这一点

NET=Aminos.split('M')
LIST=[]
rock= int(0)
while LIST[rock]!= 'M' and LIST[rock]!= '':
    LIST.append('M' + NET[rock])
    rock=rock + 1
    other=other+1
print(LIST)

but in this given example I get the index out of range error. 但是在这个给定的例子中,我得到索引超出范围错误。

This sort of thing seems rather tedious as well because I would have to break LIST apart after each Z with a split and try to concatenate 'Z' to the end of each. 这种事情似乎也很乏味,因为我必须在每个Z之后用分割符将LIST分开,然后尝试将'Z'连接到每个Z的末尾。

Does anyone know of a way of doing this more efficiently? 有谁知道更有效地做到这一点的方法?

You can use regular expressions to extract all strings beginning with M and ending with Z from a string: 您可以使用正则表达式从字符串中提取所有以M开头和Z结束的字符串:

>>> re.findall('M.*?Z', "RERTCRPVNMVRNSRRTNSKSRSRHRZGRCRCGRHWVRNFDNPFISRYRRSZTSFFIFTVKFLSSYGLKKRKIKRTTVKVQGSTIMSLLNTZLN")
['MVRNSRRTNSKSRSRHRZ', 'MSLLNTZ']

Or, if you want to keep the strings in between as well: 或者,如果您也想在字符串之间保持:

>>> re.split('(M.*?Z)', "RERTCRPVNMVRNSRRTNSKSRSRHRZGRCRCGRHWVRNFDNPFISRYRRSZTSFFIFTVKFLSSYGLKKRKIKRTTVKVQGSTIMSLLNTZLN")
['RERTCRPVN', 'MVRNSRRTNSKSRSRHRZ', 'GRCRCGRHWVRNFDNPFISRYRRSZTSFFIFTVKFLSSYGLKKRKIKRTTVKVQGSTI', 'MSLLNTZ', 'LN']

This sounds like something for re.split : 这听起来像是re.split东西:

ie: 即:

import re
ex = re.compile("M.*Z")

splitted = re.split(ex, <some input string>)

Edit: Updated per Tim Heap, as I misinterpreted "beginning with M and ending in Z" as occurring at word boundaries. 编辑:根据Tim Heap更新,因为我误认为“以M开头并以Z结尾”出现在单词边界。

Edit 2: After @Cairnarvon's feedback, here is an example that works (surprisingly, re.split doesn't accept the result of re.compile , you must pass in the regex as a string: 编辑2:在@Cairnarvon的反馈之后,这是一个有效的示例(令人惊讶的是, re.split不接受re.compile的结果,您必须将正则表达式作为字符串传递:

>>> s = "RERTCRPVNMVRNSRRTNSKSRSRHRZGRCRCGRHWVRNFDNPFISRYRRSZTSFFIFTVKFLSSYGLKKRKIKRTTVKVQGSTIMSLLNTZLN"
>>> ex = re.compile("(M.*?Z)")
>>> re.split(s, ex)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/re.py", line 167, in split
    return _compile(pattern, flags).split(string, maxsplit)
TypeError: expected string or buffer
>>> re.split("M.*Z", s)
['RERTCRPVN', 'LN']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM