简体   繁体   English

python re.sub,只替换部分匹配

[英]python re.sub, only replace part of match

I am very new to python 我是python的新手

I need to match all cases by one regex expression and do a replacement. 我需要通过一个正则表达式匹配所有情况并进行替换。 this is a sample substring --> desired result: 这是一个示例子字符串 - >所需的结果:

<cross_sell id="123" sell_type="456"> --> <cross_sell>

i am trying to do this in my code: 我想在我的代码中这样做:

myString = re.sub(r'\<[A-Za-z0-9_]+(\s[A-Za-z0-9_="\s]+)', "", myString)

instead of replacing everything after <cross_sell , it replaces everything and just returns '>' 而不是在<cross_sell之后替换所有内容,它会替换所有内容并返回'>'

is there a way for re.sub to replace only the capturing group instead of the entire pattern? 有没有办法让re.sub只替换捕获组而不是整个模式?

You can use substitution groups: 您可以使用替换组:

>>> my_string = '<cross_sell id="123" sell_type="456"> --> <cross_sell>'
>>> re.sub(r'(\<[A-Za-z0-9_]+)(\s[A-Za-z0-9_="\s]+)', r"\1", my_string)
'<cross_sell> --> <cross_sell>'

Notice I put the first group (the one you want to keep) in parenthesis and then I kept that in the output by using the "\\1" modifier (first group) in the replacement string. 请注意,我将第一组(您要保留的组)放在括号中,然后通过在替换字符串中使用"\\1"修饰符(第一组)将其保留在输出中。

You can use a group reference to match the first word and a negated character class to match the rest of the string between <> : 您可以使用组引用匹配第一个单词和否定字符类以匹配<>之间的其余字符串:

>>> s='<cross_sell id="123" sell_type="456">'
>>> re.sub(r'(\w+)[^>]+',r'\1',s)
'<cross_sell>'

\\w is equal to [A-Za-z0-9_] . \\w等于[A-Za-z0-9_]

Since the input data is XML, you'd better parse it with an XML parser . 由于输入数据是XML,因此最好使用XML解析器对其进行解析

Built-in xml.etree.ElementTree is one option: 内置的xml.etree.ElementTree是一个选项:

>>> import xml.etree.ElementTree as ET
>>> data = '<cross_sell id="123" sell_type="456"></cross_sell>'
>>> cross_sell = ET.fromstring(data)
>>> cross_sell.attrib = {}
>>> ET.tostring(cross_sell)
'<cross_sell />'

lxml.etree is an another option. lxml.etree是另一种选择。

below code tested under python 3.6 , without use group.. 下面的代码在python 3.6下测试,没有使用组..

test = '<cross_sell id="123" sell_type="456">'
resp = re.sub(r'\w+="\w+"' ,r'',test)
print (resp)

<cross_sell>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM