简体   繁体   中英

XML to dict Python with same tags that have multiple keys values

There are a lot of solutions proposed for XML to Dict but I can't get it resolve with my particular use case.

The XML format I have is with multiple same tags, but within each tag there might be many key-values, and not all tags have consistent number of key values. This makes it challenging.

For example

<?xml version="1.0" encoding="UTF-8"?>
<mxfile host="xxx.xxx.com" modified="2021-06-14T07:52:04.437Z" agent="xxx" version="12.4.8" etag="o-cccc" type="device">
  <diagram id="asdfsdf">
    <mxGraphModel dx="1213" dy="2767" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="827" pageHeight="1169" math="0" shadow="0">
      <root>
        <mxCell id="0"/>
        <mxCell id="1" parent="0"/>
        <mxCell id="2" value="label_1" style="points=[[0,0],[0.25,0],[0.5,0],[0.75,0],[1,0],[1,0.25],[1,0.5],[1,0.75],[1,1],[0.75,1],[0.5,1],[0.25,1],[0,1],[0,0.75],[0,0.5],[0,0.25]];outlineConnect=0;gradientColor=none;html=1;whiteSpace=wrap;fontSize=17;fontStyle=0;shape=shape_1;grIcon=icon_1;strokeColor=#232F3E;fillColor=none;verticalAlign=top;align=left;spacingLeft=30;fontColor=#232F3E;dashed=0;" vertex="1" parent="1">
          <mxGeometry x="110" y="-50" width="1170" height="840" as="geometry"/>
        </mxCell>
        <mxCell id="3" value="Region" style="points=[[0,0],[0.25,0],[0.5,0],[0.75,0],[1,0],[1,0.25],[1,0.5],[1,0.75],[1,1],[0.75,1],[0.5,1],[0.25,1],[0,1],[0,0.75],[0,0.5],[0,0.25]];outlineConnect=0;gradientColor=none;html=1;whiteSpace=wrap;fontSize=17;fontStyle=0;shape=shape_1;grIcon=icon_2;strokeColor=#147EBA;fillColor=none;verticalAlign=top;align=left;spacingLeft=30;fontColor=#147EBA;dashed=0;" vertex="1" parent="1">
          <mxGeometry x="290" y="190" width="960" height="580" as="geometry"/>
        </mxCell>
        <mxCell id="4" value="Area 1" style="fillColor=none;strokeColor=#147EBA;dashed=1;verticalAlign=top;fontStyle=0;fontColor=#147EBA;fontSize=17;" vertex="1" parent="1">
          <mxGeometry x="750" y="340" width="320" height="420" as="geometry"/>
        </mxCell>
        <mxCell id="5" value="Area 1" style="fillColor=none;strokeColor=#147EBA;dashed=1;verticalAlign=top;fontStyle=0;fontColor=#147EBA;fontSize=17;" vertex="1" parent="1">
          <mxGeometry x="326" y="340" width="364" height="420" as="geometry"/>
        </mxCell>
        <mxCell id="6" value="" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;fontSize=17;" edge="1" source="7" target="9" parent="1">
          <mxGeometry relative="1" as="geometry"/>
        </mxCell>
        <mxCell id="7" value="" style="outlineConnect=0;fontColor=#232F3E;gradientColor=none;fillColor=#232F3E;strokeColor=none;dashed=0;verticalLabelPosition=bottom;verticalAlign=top;align=center;html=1;fontSize=17;fontStyle=0;aspect=fixed;pointerEvents=1;shape=shape_3;" vertex="1" parent="1">
          <mxGeometry x="698.43" y="-110" width="34" height="34" as="geometry"/>
        </mxCell>
        <mxCell id="8" value="" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;fontSize=17;" edge="1" source="9" target="35" parent="1">
          <mxGeometry relative="1" as="geometry"/>
        </mxCell>
        <mxCell id="9" value="" style="outlineConnect=0;fontColor=#232F3E;gradientColor=#945DF2;gradientDirection=north;fillColor=#5A30B5;strokeColor=#ffffff;dashed=0;verticalLabelPosition=bottom;verticalAlign=top;align=center;html=1;fontSize=17;fontStyle=0;aspect=fixed;shape=shape_1;resIcon=service_2;" vertex="1" parent="1">
          <mxGeometry x="675.43" y="-30" width="80" height="80" as="geometry"/>
        </mxCell>
        <mxCell id="24" value="&lt;font style=&quot;font-size: 15px&quot;&gt;Service name 1&lt;/font&gt;" style="outlineConnect=0;fontColor=#232F3E;gradientColor=none;strokeColor=#ffffff;fillColor=#232F3E;dashed=0;verticalLabelPosition=middle;verticalAlign=bottom;align=center;html=1;whiteSpace=wrap;fontSize=17;fontStyle=1;spacing=3;shape=shape_2;prIcon=service_2;" vertex="1" parent="1">
          <mxGeometry x="159" width="62" height="100" as="geometry"/>
        </mxCell>
        <mxCell id="25" value="" style="outlineConnect=0;fontColor=#232F3E;gradientColor=none;fillColor=#D86613;strokeColor=none;dashed=0;verticalLabelPosition=bottom;verticalAlign=top;align=center;html=1;fontSize=17;fontStyle=0;aspect=fixed;pointerEvents=1;shape=shape_3;" vertex="1" parent="1">
          <mxGeometry x="817.1399999999999" y="383" width="64" height="64" as="geometry"/>
        </mxCell>
        <mxCell id="26" value="" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;fontSize=17;" edge="1" source="27" target="28" parent="1">
          <mxGeometry relative="1" as="geometry"/>
        </mxCell>
        <mxCell id="27" value="" style="outlineConnect=0;fontColor=#232F3E;gradientColor=#4D72F3;gradientDirection=north;fillColor=#3334B9;strokeColor=#ffffff;dashed=0;verticalLabelPosition=bottom;verticalAlign=top;align=center;html=1;fontSize=17;fontStyle=0;aspect=fixed;shape=shape_4;resIcon=service_3;" vertex="1" parent="1">
          <mxGeometry x="473" y="640" width="64" height="64" as="geometry"/>
        </mxCell>
        <mxCell id="28" value="" style="outlineConnect=0;fontColor=#232F3E;gradientColor=#4D72F3;gradientDirection=north;fillColor=#3334B9;strokeColor=#ffffff;dashed=0;verticalLabelPosition=bottom;verticalAlign=top;align=center;html=1;fontSize=17;fontStyle=0;aspect=fixed;shape=shape_4;resIcon=service_2;" vertex="1" parent="1">
          <mxGeometry x="885.2899999999998" y="639" width="64" height="64" as="geometry"/>
        </mxCell>
        <mxCell id="29" value="Primary&lt;br style=&quot;font-size: 17px;&quot;&gt;(Multi-area)" style="text;html=1;resizable=0;autosize=1;align=center;verticalAlign=middle;points=[];fillColor=none;strokeColor=none;rounded=0;fontSize=17;" vertex="1" parent="1">
          <mxGeometry x="458" y="701" width="90" height="50" as="geometry"/>
        </mxCell>
        <mxCell id="30" value="" style="outlineConnect=0;fontColor=#232F3E;gradientColor=none;fillColor=#D86613;strokeColor=none;dashed=0;verticalLabelPosition=bottom;verticalAlign=top;align=center;html=1;fontSize=17;fontStyle=0;aspect=fixed;pointerEvents=1;shape=shape_3;" vertex="1" parent="1">
          <mxGeometry x="385" y="380" width="68" height="68" as="geometry"/>
        </mxCell>
        <mxCell id="31" value="" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;endArrow=classic;endFill=1;fontSize=17;" edge="1" source="32" target="12" parent="1">
          <mxGeometry relative="1" as="geometry">
            <Array as="points">
              <mxPoint x="503" y="550"/>
              <mxPoint x="1160" y="550"/>
            </Array>
          </mxGeometry>
        </mxCell>
        <mxCell id="32" value="&lt;font style=&quot;font-size: 17px&quot;&gt;web component&lt;br style=&quot;font-size: 17px&quot;&gt;(&lt;b&gt;Service name 1&lt;br&gt;WordPress Instance&lt;/b&gt;)&lt;/font&gt;" style="text;html=1;resizable=0;autosize=1;align=center;verticalAlign=middle;points=[];fillColor=none;strokeColor=none;rounded=0;fontSize=17;" vertex="1" parent="1">
          <mxGeometry x="413" y="452" width="180" height="70" as="geometry"/>
        </mxCell>
      </root>
    </mxGraphModel>
  </diagram>
</mxfile>

indicators: arrow - "edgeStyle=orthogonalEdgeStyle" object (service) - "resIcon=service_2" / service_1 etc

What I have done so far - Using xml.etree.ElementTree I extracted tag and attributes in loop, extracting those values with keywords that i want.'

Result stored in arrays.

id = []
attrb = []
objects_found = []
arrows_found = []

and I want to transformed to dict object finally -

{
   id: '1',
   attrb: 'service'
   object: 'service_a'
   arrows: true
   arrow_start: {coordinate}
   arrow_end: {coordinate}
}

if no arrows:

{
   id: '1',
   attrb: 'service'
   object: 'service_a'
}

my code:

for item in tree.iter():
    if item.tag == 'mxCell':
        id = item.attrib['id']
        # to split the long list of words with ';' in 'style' key. Major info is in there.        
            style_list = item.attrib['style'].split(';')
            for style in style_list: 
                if '=' in style:
                    style_key = style.split('=')[0]
                    style_value = style.split('=')[1]
                    if style_key == 'shape' and style_value != 'icon' and 'keyword-a' in style_value:
                        service_icon = style_value
                        id.append(id)
                        attrb.append("service_name")
                        objects_found.append(service_icon)
                    elif style_key == 'resIcon':
                        service_icon = style_value
                        id.append(id)
                        attrb.append("service_name")
                        objects_found.append(service_icon)
                    elif style_key == 'edgeStyle':
                        arrow_style = style_value
                        id.append(id)
                        attrb.append("arrows")
                        arrows_found.append(arrow_style)

I tried with

  1. dict(zip)). but challenge is that there might be some optional keys that will not exists in some ids.
  2. pandas dataframe (not ideal as I intend to get to dict) but I tried csv to get in table form, array wont work as well because the array values I have gotten has lost the relationship between the ids and key-values identified by putting them into array, also array with different length is not able to put into dataframe together.

Any good suggestion to any solution?

Finally found a simple but easy way

With the current logic, I can extract the data with the keywords, and with each identified key-value, I will just append into nested dict.

ie

dictObj = {} and within for loop, start with initiating nested dict within it for each id - dictObj[id] = {}

and with each key identified, continue with dictObj[id].update({'key': value})

not sure if this is the most efficient way but at least i get the output i want. If anyone has better way of doing it please share.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM