简体   繁体   中英

Reading XML file leaving possible new lines

I am using ElementTree to read an .xml file and save the output to a .csv file. I loop over all of the lines in the xml file and save the name and text pairs in a list.

savedParameters = []

tree = ET.parse(work_dir + input_name)
root = tree.getroot()

for child in root:
    savedParameters.append({'parameterName' : child.tag, 'Value' : child.text})
    for gchild in child:
        savedParameters.append({'parameterName' : gchild.tag, 'Value' : gchild.text})
        for ggchild in gchild:
        .
        .
        .

I then loop over the savedParameters and write them to a csv file. This all works fine apart from in one situation, take the example from the xml below.

<VehicleId>123456789</VehicleId>
-<VRMs>
    <ForAppointment>X111XXX</ForAppointment>
    <Alternate>X111XXX</Alternate>
</VRMs>
<Vin>123456</Vin>

In this case everything will be stored as expected apart from the field. This field should be empty however when I access child.text() it stores a blank string with a new line ie all of the spaces between -<VRMs> and </VRMs> . Therefore when I write out to the csv it writes out the new line.

I have tried replace(" ", "") and replace("\\n","") but neither solve my problem. Does anyone know a way around this?

You should be able to strip out the newlines (from start and end of a string) using str.strip() (without giving any arguments) .

Example -

>>> s = "\n    \n asd \n    \n \n \n\n    "
>>> s.strip()
'asd'
>>> s = "\n    \n \n    \n \n \n\n    "
>>> s.strip()
''

As seen above, str.strip() would return empty string, if the string only contains whitespaces, which seems to be the case for your child.text . So you should be able to do - child.text.strip() before you try to store it in the dictionary. Example -

for child in root:
    savedParameters.append({'parameterName' : child.tag, 'Value' : child.text.strip()})
    for gchild in child:
        savedParameters.append({'parameterName' : gchild.tag, 'Value' : gchild.text.strip()})
        for ggchild in gchild:
        .
        .
        .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM