简体   繁体   中英

How to Convert XML to JSON using PHP?

There are multiple threads about converting XML to JSON in PHP and I do already have the following code that's working pretty well:

function jsonPrepareXml(object $domNode): void
{
    foreach ($domNode->childNodes as $node) {
        if ($node->hasChildNodes()) {
            jsonPrepareXml($node);
        } else {
            if ($domNode->hasAttributes() && strlen($domNode->nodeValue) !== 0) {
                $domNode->setAttribute("nodeValue", $node->textContent);
                $node->nodeValue = "";
            }
        }
    }
}

$dom = new \DOMDocument();
$dom->loadXML(FileHelpers::fileGetContents($file), LIBXML_NOCDATA);
jsonPrepareXml($dom);
$xmlData = $dom->saveXML();

$sxml = \simplexml_load_string($xmlData);
$json = \json_decode(
    \json_encode($sxml, JSON_THROW_ON_ERROR),
    null,
    512,
    JSON_THROW_ON_ERROR
);

Now I encountered the issue that in some XML-Files Text that is in CData sections is truncated in some cases. I was not able to find what those files have in common. It was not always the same amount of chars. And if I copied only the CData section to an empty XML for debugging the whole data was read. So I thought I would remove the LIBXML_NOCDATA constant as libxml reads the whole text when parsing as cdata. But then the conversion to JSON fails as cdata is not converted. So I thought I would convert cdata nodes to normal text-node like this in the jsonPrepareXml() function

elseif ($node instanceof \DOMCdataSection) {
    $node = new \DOMText((string) $node->nodeValue);
}

But this does not change anything.

Are there any ideas on how to fix this issue? Of course, it would be great if the original function would work, but I was not able to fix this. Even with different PHP versions or libxml versions. So I gave up on this. Currently, I'm on PHP 8.0.11.

Update: So far I was not able to publish an xml file that triggered the error as the files contained a lot of personal data. But now I do have one xml file that shows the error quite nicely: https://drive.google.com/file/d/10iyiH1O6oKG9Zbv91He1_KlCQlhdeZoO/view?usp=sharing If I load the file with the following code, it ends with 'Majapahit Empire, the city' at day 4.

<?php declare(strict_types=1);

$dom = new \DOMDocument();
$dom->loadXML(FileHelpers::fileGetContents($file), LIBXML_NOCDATA);

header("Content-type: text/plain");
echo $dom->saveXML();

So this is event with my function to prepare the attributes for the json conversion. As stated, I can remove LIBXML_NOCDATA but then I get empty nodes when converting to json.

So I would be looking for a fix or at least a workaround that would convert all the cdata notes into normal text-nodes.

The main issue really are the cdata nodes and not the jsonPrepareXml function. I just wanted to use that function for the workaround.

No idea this is solving your CDATA/XML issue, but as commented, it looked fishy to me, here my algorithm:

function jsonPrepareNode(DOMNode $node): void
{
    if ($node->hasAttributes() && strlen($node->nodeValue) !== 0) {
        $node->setAttribute("nodeValue", $node->textContent);
        $node->nodeValue = "";
    }

    foreach ($node->childNodes as $child) {
        jsonPrepareNode($child);
    }
}

if it does not yet fully solve your issue, read on for more options:


For more controlled json encoding of XML, including with SimpleXML, I've written a blog-post series that deals with common problem cases and show how you can implement your own XML to JSON style in PHP:

As you use both DOM Document and SimpleXML using only SimpleXML might match your needs, too.

As especially the later encoding examples show how to integrate with the JsonSerialize interface, alternatively it would be possible with DOMDocument and using own Node class(es); compare DOMBlaze , see ref .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM