简体   繁体   中英

parse XML using Python

<?xml version="1.0" encoding="UTF-8"?>
<WindowElement xmlns="http://windows.lbl.gov">
    <WindowElementType>System</WindowElementType>
    <Optical>
        <WavelengthData>
            <LayerNumber>System</LayerNumber>
            <Wavelength unit="Integral">Visible</Wavelength>
            <SourceSpectrum>CIE Illuminant D65 1nm.ssp</SourceSpectrum>
            <DetectorSpectrum>ASTM E308 1931 Y.dsp</DetectorSpectrum>
            <WavelengthDataBlock>
                <WavelengthDataDirection>Transmission Front</WavelengthDataDirection>
                <ColumnAngleBasis>LBNL/Klems Full</ColumnAngleBasis>
                <RowAngleBasis>LBNL/Klems Full</RowAngleBasis>
                <ScatteringDataType>BTDF</ScatteringDataType>
                <ScatteringData> 1, 2, 3, 3 
                             </ScatteringData>
            </WavelengthDataBlock>
        </WavelengthData>
    <WavelengthData>
        <LayerNumber>System</LayerNumber>
        <Wavelength unit="Integral">Visible</Wavelength>
        <SourceSpectrum>CIE Illuminant D65 1nm.ssp</SourceSpectrum>
        <DetectorSpectrum>ASTM E308 1931 Y.dsp</DetectorSpectrum>
        <WavelengthDataBlock>
            <WavelengthDataDirection>Transmission Back</WavelengthDataDirection>
            <ColumnAngleBasis>LBNL/Klems Full</ColumnAngleBasis>
            <RowAngleBasis>LBNL/Klems Full</RowAngleBasis>
            <ScatteringDataType>BTDF</ScatteringDataType>
            <ScatteringData> 555, 555
.......

How can I use Python to read 1, 2, 3, 3 in the ScatteringData element and change it to 5, 8, 8

There are two elements called ScatteringData and only the first one is changed.

Thank you!

You should look at libraries that are available for working with XML in python. You could start here http://wiki.python.org/moin/PythonXml

If you have to deal with xml's I suggest you take a look at lxml .

They say that lxml is the most feature-rich and easy-to-use library for processing XML and HTML in the Python language. And it's faster and more robust than it's alternatives. And do search in SO for lxml and others, because there are plenty of suggestion in previous questions about which one to use.

from lxml import etree as ET

In [14]: root = ET.fromstring(datafragment)

In [15]: root.xpath('.//scatteringdata')[0].text='blah'

In [16]: print ET.tostring(root,pretty_print=True)
...
<scatteringdata>blah</scatteringdata>
...

if you have to make changes in more that one place, use a loop:

for i in root.xpath('.//scatteringdata'):
    i.text='smth'

Here's a solution using beautiful soup . Basically it allows you to just walk down to the data and modify it as you see fit.

import BeautifulSoup
soup = BeautifulSoup.BeautifulSoup(open("waves.xml"))
soup.scatteringdata.string = "5, 8, 8"
print soup.prettify()

Which outputs:

  ...
  <scatteringdatatype>
    BTDF
   </scatteringdatatype>
   <scatteringdata>
    5, 8, 8
   </scatteringdata>
  </wavelengthdatablock>
  ...

If you wanted to take a look at the data first you can use

originalData = soup.scatteringdata.string 

and then process that as you will

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM