[英]How to remove sub-string starting and ending with something?
如何從以特定字符組合開頭和結尾的字符串中刪除子字符串,例如:
' bla <span class=""latex""> ... This can be different1 ... </span> blub <span class=""latex""> ... This can be different2 ... </span> bleb'
我想要的結果:
'bla blub bleb'
我嘗試過這樣的事情
string.replace('<span class=""latex"">' * '</span>', '')
但這不起作用。
有沒有辦法實現這個?
閱讀有關re.sub function的信息。
一個簡單的例子:
import re
s = ' cvbcx cvbcx <span class=""latex""> ... This can be different ... </span>vcvbcxbvxc'
re.sub(r'<span class=""latex"">.+</span>', '<span class=""latex""></span>', s)
>> ' cvbcx cvbcx <span class=""latex""></span>vcvbcxbvxc'
如果您想要某些部分而不是其他部分,則需要使用組。
import re
s = ' cvbcx cvbcx <span class=""latex""> ... This can be different ... </span>vcvbcxbvxc'
r = re.search( r'(<span class=""latex"">)(.+)(</span>)', s)
print(s)
# cvbcx cvbcx <span class=""latex""> ... This can be different ... </span>vcvbcxbvxc
# print(r)
# <re.Match object; span=(13, 73), match='<span class=""latex""> ... This can be different >
print(r.group(1), r.group(3))
# <span class=""latex""> </span>
如果要將數據保留在兩者之間:
>>> x
'<span class=""latex""> ... This can be different ... </span>'
>>>
>>> d = re.sub('<(/)?span(\ class=\"\".*\"\")?(>)', '', x)
>>>
>>> d
' ... This can be different ... '
>>>
如果要保留標簽:
>>> x
'<span class=""latex""> ... This can be different ... </span>'
>>>
>>>
>>>
>>> new_data = 'abc 123 456'
>>>
>>>
>>> d = re.sub('\">.*</','\">{}</'.format(new_data),x)
>>>
>>>
>>> d
'<span class=""latex"">abc 123 456</span>'
>>>
>>>
>>>
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.