I have the following name spaces coming from a certain service
<soapenv:Envelope xmlns:soapenv=http://schemas.xmlsoap.org/soap/envelope/ xmlns:soap=http://www.4cgroup.co.za/soapauth xmlns:gen=http://www.4cgroup.co.za/genericsoap>
Trying to parse this request I receive the following error
xml.etree.ElementTree.ParseError: not well-formed
I noticed there is no ""
on namespace value. How can I add them with regular expression
Proper format
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:soap="http://www.4cgroup.co.za/soapauth" xmlns:gen="http://www.4cgroup.co.za/genericsoap">
Note double quotes
Using regex:
import re
namespace = "<soapenv:Envelope xmlns:soapenv=http://schemas.xmlsoap.org/soap/envelope/ xmlns:soap=http://www.4cgroup.co.za/soapauth xmlns:gen=http://www.4cgroup.co.za/genericsoap>"
FIND_URL = re.compile(r"((?:(?:https?|ftp):\/\/)?[\w/\-?=%.]+\.[\w/\-?=%.]+)")
print(FIND_URL.sub(r'"\1"', namespace))
Output:
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:soap="http://www.4cgroup.co.za/soapauth" xmlns:gen="http://www.4cgroup.co.za/genericsoap">
Note that the regex isn't perfect. It works for this case but if the urls become more "unique" it may fail.
Credit to this answer
This regex seems to do the trick:
import re
nsmap = "<soapenv:Envelope xmlns:soapenv=http://schemas.xmlsoap.org/soap/envelope/ xmlns:soap=http://www.4cgroup.co.za/soapauth xmlns:gen=http://www.4cgroup.co.za/genericsoap>"
nsmap = re.sub(r"(https?://.*?)(?=\sxmlns|>)", r'"\1"', nsmap)
print(nsmap)
Output:
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:soap="http://www.4cgroup.co.za/soapauth" xmlns:gen="http://www.4cgroup.co.za/genericsoap">
Check it out online here .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.