简体   繁体   中英

Python Regex for Searching pattern in text file

Tags in Sample.txt:

<ServiceRQ>want everything between...</ServiceRQ>

<ServiceRQ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance>want everything between</ServiceRQ> ..

Please can someone help me to get the regex? To extract the expected output from a text file. I want to create a regex to find the above tags. This is what is have tried re.search(r"<(.*?)RQ(.*?)>(.*?)</(.*?)RQ>", line) but not working properly. I want to make a search based on word RQ in text file

The expected output should be

1. <ServiceRQ>want everything between</ServiceRQ>
2. <ServiceRQ> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance>want everything between</ServiceRQ>

Try this pattern

regex= r'<\w+RQ.*?>.*?</\w+RQ>'
data=re.findall(regex, line)

The above regex will give output like

['<ServiceRQ>want everything between...</ServiceRQ>', '<ServiceRQ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance>want everything between</ServiceRQ>']

As Ashish has mentioned, this one gives the tag including the contents.

regex= r'<\w+RQ.*?>.*?</\w+RQ>'
data=re.findall(regex, line)

You can also do this to retrieve JUST the contents within the tags. Changing .*? to (.*?) between the tags.

regex = r'<\w+RQ.*?>(.*?)<\/\w+RQ>'
data = re.findall(regex, sample)

This would result in the following output:

['want everything between...', 'want everything between']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM