简体   繁体   中英

Parsing XML in Python finding element by id-tag

I'm trying to use regex to parse an XML file (in my case this seems it's right way).

My XML looks like this:

line='<form id="main">\n<input {disable}  style="display:none" id="CALLERID" 
value="58713780">\n<input {disable}  style="display:none" id="GR_BUS" 
value="VGH1"\n<td><input id="label" {disable} style="font- 
size:9px;width:100%;margin:0;padding:1;" type=text></td>\n</form>>'

To access the text , I'm using: attr = re.search('[@id = (CALLERID|GR_BUS|label)]', line)

I want to get the result of parsing xml in format:

<CALLERID>58713780</CALLERID><GR_BUS>VGH1</GR_BUS><label></label>

but nothing is being returned.

Can someone point out what I'm doing wrong? Thanks to

Here is the output :

line = '''<form id="main">\n
<input {disable}  style="display:none" id="CALLERID" value = "58713780" >\n
<input{disable} style = "display:none" id = "GR_BUS" value = "VGH1"\n >
< td >< inputid = "label"{disable}style = "font-size: 9px;width: 100 %;margin: 0;padding: 1;" type=text></td>
</form>>'''


from bs4 import BeautifulSoup

soup = BeautifulSoup(line, "lxml")
for values in soup.findAll("input"):
    id = values["id"]
    value = values["value"]
    print(id, value)

output :

('CALLERID', '58713780')
('GR_BUS', 'VGH1')

First, what you have in your example is not valid XML, but HTML. More likely an HTML template, considering the {disable} directives in your string.

Second, your regex is invalid, as it does not take into account the quotes around the id attribute. I also suppose you need a capturing group for the value attribute too, in order to build your final result and take into account that the value is not always present (ie in case of the label id).

The regex which does that is id=\\"(CALLERID|GR_BUS|label)\\"(\\s*value=\\"(\\S*)\\")? . For each match, the first capturing group will contain the value of the id attribute and the 3rd group (if present) will contain the value of the value attribute.

You can test it at https://regex101.com , by selecting python as language.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM