简体   繁体   English

DOMDocument缺少HTML标记

[英]DOMDocument missing HTML tags

I play an online game called Tribalwars, and am now trying to write a report parser. 我玩一个名为Tribalwars的在线游戏,现在正试图编写一个报告解析器。 A typical report looks like this: 典型的报告如下所示:

https://enp2.tribalwars.net/public_report/395cf3cc373a3b8873c20fa018f1aa07 https://enp2.tribalwars.net/public_report/395cf3cc373a3b8873c20fa018f1aa07

I have two functions adapted from php.net that now look as follows: 我有两个从php.net改编的函数,现在看起来如下:

function has_child($p)
{
    if ($p->hasChildNodes())
    {
        foreach ($p->childNodes as $c)
        {
            if ($c->nodeType == XML_ELEMENT_NODE)
            {
                return true;
            }
        }
    }
    return false;
}

function show_node($x)
{
    foreach ($x->childNodes as $p)
    {
        if ($this->has_child($p))
        {
            $this->show_node($p);
        }
        elseif ($p->nodeType == XML_ELEMENT_NODE)
        {
            if (trim($p->nodeValue) !== '')
            {
                $temp = explode("\n", $p->nodeValue);
                if (count($temp) == 1)
                {
                    $this->reportdata[] = trim($temp[0]);
                }
                else
                {
                    foreach ($temp as $k => $v)
                    {
                        if (trim($v) !== '')
                        {
                            $this->reportdata[] = trim($v);
                        }
                    }
                }
            }
        }
    }
}

It returns the result in the following format: 它以以下格式返回结果:

Array
(
    [0] => MASHAD (27000) attacks 40-014-Devil...
    [1] => May 11, 2016  19:27:12
    [2] => MASHAD has won
    [3] => Attacker's luck
    ...
    [76] => Espionage
    [77] => Resources scouted:
    [78] => Building
    ...
    [112] => Haul:
    [113] => .
    [114] => .
    [115] => .
    [116] => .
    [117] => .
    ...
    [120] => https://enp2.tribalwars.net/public_report/395...
)

For the most part this works, but some data goes lost in the parsing. 在大多数情况下,这是可行的,但是某些数据在解析中会丢失。 If you look at the report at the link, you will see "Resources scouted" and "Haul" sections. 如果您查看链接中的报告,则会看到“资源搜寻”和“运输”部分。 Both these sections contain <span> , incidentally. 这两个部分都包含<span> For some reason those two sections are missing in the array that the functions return. 由于某些原因,函数返回的数组中缺少这两个部分。 (See array item 77 and array items 113 - 118). (请参阅数组项目77和数组项目113-118)。 Lines 113 - 118 just show the . 第113-118行仅显示了. of the strangely formatted number, line 77 just has nothing. 在格式异常的数字中,第77行什么也没有。

In the function where I call the show_node() function, I do a little bit of processing to throw out unnecessary DOM code that is not needed: 在调用show_node()函数的函数中,我做了一些处理以抛出不需要的不必要的DOM代码:

$temp = explode('<h1>Publicized report</h1>', $report[0]['reportdata']);
$rep = $temp[1];
$temp = explode('For quick copy and paste', $rep);
$rep = '<report>' . $temp[0] . '</report>';
$x = new DOMDocument();
$x->loadHTML($rep);
$this->show_node($x->getElementsByTagName('report')->item(0));

If I do an output of $rep before calling the show_node() function, the information I need for Haul and Resources scouted is present. 如果我做的输出$rep调用之前show_node()函数,我需要的信息HaulResources scouted存在。

What could be the problem? 可能是什么问题呢?

It appears as if DOMDocument has a limit on how deep in the document it goes to or something. 似乎DOMDocument对其在文档中的深度或深度有限制。 Either that or the recursive code above is wrong. 这或者上面的递归代码是错误的。 I have therefore identified the piece of code that is not being parsed, saw that it is well-formed and then went on to remove its children that I do not need with str_replace() , and that ended up getting the values in my array. 因此,我确定了未解析的代码片段,看到它的格式正确,然后继续使用str_replace()删除了不需要的子str_replace() ,最终得到了数组中的值。 Anyway, this problem is now resolved. 无论如何,此问题现在已解决。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM