简体   繁体   中英

Parse generated XML from InDesign with PHP

I'm generating XML from InDesign and would like to parse the XML in PHP. Below is a sample of the XML that InDesign is generating:

<?xml version="1.0" encoding="UTF-8"?>
<Root>
<page title="About Us">
  About Us
  <page>Overiew</page>
  <page>Where We Started</page>
  <page>Help</page>
</page>
<page>
  Automobiles
  <page>
     Cars
     <page>Small</page>
     <page>Medium</page>
     <page>Large</page>
  </page>
  <page>
     Trucks
     <page>Flatbet</page>
     <page>
        Pickup
        <page>Dodge</page>
        <page>Nissan</page>
     </page>
  </page>
</page>
</Root>

I'm using the following PHP code to parse the XML recursively.

header('Content-type: text/plain');

function parse_recursive(SimpleXMLElement $element, $level = 0)
{
        $indent     = str_repeat("\t", $level); // determine how much we'll indent

        $value      = trim((string) $element);  // get the value and trim any whitespace from the start and end
        $attributes = $element->attributes();   // get all attributes
        $children   = $element->children();     // get all children

        echo "{$indent}Parsing '{$element->getName()}'...".PHP_EOL;
        if(count($children) == 0 && !empty($value)) // only show value if there is any and if there aren't any children
        {
                echo "{$indent}Value: {$element}".PHP_EOL;
        }

        // only show attributes if there are any
        if(count($attributes) > 0)
        {
                echo $indent.'Has '.count($attributes).' attribute(s):'.PHP_EOL;
                foreach($attributes as $attribute)
                {
                        echo "{$indent}- {$attribute->getName()}: {$attribute}".PHP_EOL;
                }
        }

        // only show children if there are any
        if(count($children))
        {
                echo $indent.'Has '.count($children).' child(ren):'.PHP_EOL;
                foreach($children as $child)
                {
                        parse_recursive($child, $level+1); // recursion :)
                }
        }

        echo $indent.PHP_EOL; // just to make it "cleaner"
}

$xml = new SimpleXMLElement('data.xml', null, true);

parse_recursive($xml);

The issue that I'm having is that when I parse the XML, I'm not getting the text values of each page node unless completely surrounded by a page tag. So, for example, I have no way of reading "About Us" unless looking at the title attribute (if it exists). The same applies for "Automobiles" and "Cars" and "Trucks".

Again, this is generated XML from InDesign. I could ask designers to add attributes to nodes, etc. but I'm trying to minimize the amount of data entry.

I believe the XML is well formed. Any help would be greatly appreciated.

You ignore all text values, if node has any childs, to change that replace:

if(count($children) == 0 && !empty($value)) // only show value if there is any and if there aren't any children
{
  echo "{$indent}Value: {$element}".PHP_EOL;
}

with

if(!empty($value)) // only show value if there is anychildren
{
  echo "{$indent}Value: {$value}".PHP_EOL;
}

an then result with sample data is:

Parsing 'Root'...
Has 2 child(ren):
    Parsing 'page'...
    Value: About Us
    Has 1 attribute(s):
    - title: About Us
    Has 3 child(ren):
        Parsing 'page'...
        Value: Overiew

        Parsing 'page'...
        Value: Where We Started

        Parsing 'page'...
        Value: Help


    Parsing 'page'...
    Value: Automobiles
    Has 2 child(ren):
        Parsing 'page'...
        Value: Cars
        Has 3 child(ren):
            Parsing 'page'...
            Value: Small

            Parsing 'page'...
            Value: Medium

            Parsing 'page'...
            Value: Large


        Parsing 'page'...
        Value: Trucks
        Has 2 child(ren):
            Parsing 'page'...
            Value: Flatbet

            Parsing 'page'...
            Value: Pickup
            Has 2 child(ren):
                Parsing 'page'...
                Value: Dodge

                Parsing 'page'...
                Value: Nissan

Of course, I struggled with this but as soon as I ask the question I find the answer. Anyway, this approach worked (top answer):

How to get a specific node text using php DOM

I'm wondering if there's any other way, though.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM