简体   繁体   English

使用多个<a>标签后代</a>抓取 HTML 页面<div id="text_translate"><p>我在数据库字段中有这个 html 源代码。 我想分析这段代码,特别是一些表格的字段,并将它们打印在屏幕上。 这是关于表的代码:</p><pre> &lt;table cellspacing="1" cellpadding="1" class="troop_details inReturn" &gt; &lt;thead&gt; &lt;tr&gt; &lt;td class="role"&gt; &lt;a href="/karte.php?d=91628"&gt;01] #WorkInProgress&lt;/a&gt; &lt;/td&gt; &lt;td colspan="11" class="troopHeadline"&gt; &lt;a href="/karte.php?d=91611"&gt;Return from 01-soldier&lt;/a&gt; &lt;/td&gt; &lt;/tr&gt; &lt;/thead&gt; &lt;tbody class="units"&gt; &lt;tr&gt; &lt;th class="coords"&gt; &amp;#x202d;&lt;span class="coordinates coordinatesWrapper coordinatesAligned coordinatesltr"&gt;&lt;span class="coordinateX"&gt;(&amp;#x202d;&amp;minus;&amp;#x202d;1&amp;#x202c;&amp;#x202c;&lt;/span&gt;&lt;span class="coordinatePipe"&gt;|&lt;/span&gt;&lt;span class="coordinateY"&gt;&amp;#x202d;&amp;minus;&amp;#x202d;28&amp;#x202c;&amp;#x202c;)&lt;/span&gt;&lt;/span&gt;&amp;#x202c; &lt;/th&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u21" title="Phalanx: 1:12:51" alt="Phalanx" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u22" title="Swordsman: 1:25:00" alt="Swordsman" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u23" title="Pathfinder: 0:30:00" alt="Pathfinder" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u24" title="Theutates Thunder: 0:26:51" alt="Theutates Thunder" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u25" title="Druidrider: 0:31:53" alt="Druidrider" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u26" title="Haeduan: 0:39:14" alt="Haeduan" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u27" title="Ram: 2:07:30" alt="Ram" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u28" title="Trebuchet: 2:50:00" alt="Trebuchet" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u29" title="Chieftain: 1:42:00" alt="Chieftain" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u30" title="Settler: 1:42:00" alt="Settler" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon last"&gt; &lt;img class="unit uhero" title="Hero" alt="Hero" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;/tr&gt; &lt;/tbody&gt; &lt;tbody class="units last"&gt; &lt;tr&gt; &lt;th&gt;Troops&lt;/th&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit"&gt; 500 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none last"&gt; 0 &lt;/td&gt; &lt;/tr&gt; &lt;/tbody&gt; &lt;tbody class="infos"&gt; &lt;tr&gt; &lt;th&gt;Bounty&lt;/th&gt; &lt;td colspan="11"&gt; &lt;div class="res"&gt; &lt;div class="inlineIconList resourceWrapper"&gt;&lt;div class="inlineIcon resources" title="Lumber"&gt;&lt;i class="r1"&gt;&lt;/i&gt;&lt;span class="value "&gt;6758&lt;/span&gt;&lt;/div&gt;&lt;div class="inlineIcon resources" title="Clay"&gt;&lt;i class="r2"&gt;&lt;/i&gt;&lt;span class="value "&gt;8093&lt;/span&gt;&lt;/div&gt;&lt;div class="inlineIcon resources" title="Iron"&gt;&lt;i class="r3"&gt;&lt;/i&gt;&lt;span class="value "&gt;6908&lt;/span&gt;&lt;/div&gt;&lt;div class="inlineIcon resources" title="Crop"&gt;&lt;i class="r4"&gt;&lt;/i&gt;&lt;span class="value "&gt;15741&lt;/span&gt;&lt;/div&gt;&lt;/div&gt; &lt;/div&gt; &lt;div class="carry"&gt; &lt;img class="carry full" title="carry" alt="carry" src="/img/x.gif"/&gt; &amp;#x202d;&amp;#x202d;37500&amp;#x202c;&amp;nbsp;/&amp;nbsp;&amp;#x202d;37500&amp;#x202c;&amp;#x202c; &lt;/div&gt; &lt;/td&gt; &lt;/tr&gt; &lt;/tbody&gt; &lt;tbody class="infos"&gt; &lt;tr&gt; &lt;th&gt;Arrival&lt;/th&gt; &lt;td colspan="11"&gt; &lt;div class="in"&gt;in&amp;nbsp;&lt;span class="timer" counting="down" value="85"&gt;0:01:25&lt;/span&gt;&amp;nbsp;hrs.&lt;/div&gt; &lt;div class="at"&gt;&lt;span&gt;at&amp;nbsp;00:43:10&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/div&gt; &lt;/td&gt; &lt;/tr&gt; &lt;/tbody&gt; &lt;/table&gt; &lt;a name="at"&gt;&lt;/a&gt; &lt;table cellspacing="1" cellpadding="1" class="troop_details inReturn" &gt; &lt;thead&gt; &lt;tr&gt; &lt;td class="role"&gt; &lt;a href="/karte.php?d=91628"&gt;01] #WorkInProgress&lt;/a&gt; &lt;/td&gt; &lt;td colspan="11" class="troopHeadline"&gt; &lt;a href="/karte.php?d=94829"&gt;Return from 0-New Hulk&lt;/a&gt; &lt;/td&gt; &lt;/tr&gt; &lt;/thead&gt; &lt;tbody class="units"&gt; &lt;tr&gt; &lt;th class="coords"&gt; &amp;#x202d;&lt;span class="coordinates coordinatesWrapper coordinatesAligned coordinatesltr"&gt;&lt;span class="coordinateX"&gt;(&amp;#x202d;&amp;minus;&amp;#x202d;1&amp;#x202c;&amp;#x202c;&lt;/span&gt;&lt;span class="coordinatePipe"&gt;|&lt;/span&gt;&lt;span class="coordinateY"&gt;&amp;#x202d;&amp;minus;&amp;#x202d;28&amp;#x202c;&amp;#x202c;)&lt;/span&gt;&lt;/span&gt;&amp;#x202c; &lt;/th&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u21" title="Phalanx: 0:45:33" alt="Phalanx" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u22" title="Swordsman: 0:53:09" alt="Swordsman" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u23" title="Pathfinder: 0:18:46" alt="Pathfinder" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u24" title="Theutates Thunder: 0:16:47" alt="Theutates Thunder" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u25" title="Druidrider: 0:19:56" alt="Druidrider" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u26" title="Haeduan: 0:24:32" alt="Haeduan" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u27" title="Ram: 1:19:44" alt="Ram" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u28" title="Trebuchet: 1:46:18" alt="Trebuchet" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u29" title="Chieftain: 1:03:47" alt="Chieftain" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u30" title="Settler: 1:03:47" alt="Settler" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon last"&gt; &lt;img class="unit uhero" title="Hero" alt="Hero" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;/tr&gt; &lt;/tbody&gt; &lt;tbody class="units last"&gt; &lt;tr&gt; &lt;th&gt;Troops&lt;/th&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit"&gt; 400 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none last"&gt; 0 &lt;/td&gt; &lt;/tr&gt; &lt;/tbody&gt; &lt;tbody class="infos"&gt; &lt;tr&gt; &lt;th&gt;Bounty&lt;/th&gt; &lt;td colspan="11"&gt; &lt;div class="res"&gt; &lt;div class="inlineIconList resourceWrapper"&gt;&lt;div class="inlineIcon resources" title="Lumber"&gt;&lt;i class="r1"&gt;&lt;/i&gt;&lt;span class="value "&gt;6130&lt;/span&gt;&lt;/div&gt;&lt;div class="inlineIcon resources" title="Clay"&gt;&lt;i class="r2"&gt;&lt;/i&gt;&lt;span class="value "&gt;5835&lt;/span&gt;&lt;/div&gt;&lt;div class="inlineIcon resources" title="Iron"&gt;&lt;i class="r3"&gt;&lt;/i&gt;&lt;span class="value "&gt;5638&lt;/span&gt;&lt;/div&gt;&lt;div class="inlineIcon resources" title="Crop"&gt;&lt;i class="r4"&gt;&lt;/i&gt;&lt;span class="value "&gt;12397&lt;/span&gt;&lt;/div&gt;&lt;/div&gt; &lt;/div&gt; &lt;div class="carry"&gt; &lt;img class="carry full" title="carry" alt="carry" src="/img/x.gif"/&gt; &amp;#x202d;&amp;#x202d;30000&amp;#x202c;&amp;nbsp;/&amp;nbsp;&amp;#x202d;30000&amp;#x202c;&amp;#x202c; &lt;/div&gt; &lt;/td&gt; &lt;/tr&gt; &lt;/tbody&gt; &lt;tbody class="infos"&gt; &lt;tr&gt; &lt;th&gt;Arrival&lt;/th&gt; &lt;td colspan="11"&gt; &lt;div class="in"&gt;in&amp;nbsp;&lt;span class="timer" counting="down" value="920"&gt;0:15:20&lt;/span&gt;&amp;nbsp;hrs.&lt;/div&gt; &lt;div class="at"&gt;&lt;span&gt;at&amp;nbsp;00:57:05&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/div&gt; &lt;/td&gt; &lt;/tr&gt; &lt;/tbody&gt; &lt;/table&gt;</pre><p> 我感兴趣的数据如下:</p><ol><li> 从 01-士兵<strong>归来 00:43:10</strong></li><li> 从 0-新绿巨人<strong>归来 00:57:05</strong></li></ol><p> 感谢您的建议,这是我目前的代码:</p><pre> &lt;?php include 'database.php'?&gt; &lt;?php session_start(); ?&gt; &lt;?php include_once('simple_html_dom.php'); $caserma = $_SESSION["caserma"]; $dom = new DOMDocument; libxml_use_internal_errors(true); $dom-&gt;loadHTML($_SESSION["caserma"], LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD); $xpath = new DOMXPath($dom); $texts = []; foreach ($xpath-&gt;query("//table[contains(@class, 'troop_details') and contains(@class, 'inReturn')]//td[@class='troopHeadline']//a[@href]/text()") as $textNode) { $texts[] = $textNode-&gt;nodeValue; } var_export($texts); ?&gt;</pre><p> 但是作为 output 它给了我数组()</p></div>标签并从特定的文本中提取文本<table> </table>

[英]Scrape HTML page with multiple <table> tags and extract text from specific <a> tag descendants

I have this html source code in database field.我在数据库字段中有这个 html 源代码。 I would like to analyze this code, in particular the fields of some tables, and print them on the screen.我想分析这段代码,特别是一些表格的字段,并将它们打印在屏幕上。 This is the code about table:这是关于表的代码:

<table cellspacing="1" cellpadding="1" class="troop_details inReturn"
    >
        <thead>
            <tr>
                <td class="role">
                                            <a href="/karte.php?d=91628">01] #WorkInProgress</a>
                                    </td>
                <td colspan="11" class="troopHeadline">
                                                                <a href="/karte.php?d=91611">Return from 01-soldier</a>
                                    </td>
            </tr>
        </thead>
        <tbody class="units">
            <tr>
                <th class="coords">
                                            &#x202d;<span class="coordinates coordinatesWrapper coordinatesAligned coordinatesltr"><span class="coordinateX">(&#x202d;&minus;&#x202d;1&#x202c;&#x202c;</span><span class="coordinatePipe">|</span><span class="coordinateY">&#x202d;&minus;&#x202d;28&#x202c;&#x202c;)</span></span>&#x202c;                                    </th>
                                    <td class="uniticon">
                        <img class="unit u21" title="Phalanx: 1:12:51" alt="Phalanx" src="/img/x.gif" />                    </td>
                                    <td class="uniticon">
                        <img class="unit u22" title="Swordsman: 1:25:00" alt="Swordsman" src="/img/x.gif" />                    </td>
                                    <td class="uniticon">
                        <img class="unit u23" title="Pathfinder: 0:30:00" alt="Pathfinder" src="/img/x.gif" />                  </td>
                                    <td class="uniticon">
                        <img class="unit u24" title="Theutates Thunder: 0:26:51" alt="Theutates Thunder" src="/img/x.gif" />                    </td>
                                    <td class="uniticon">
                        <img class="unit u25" title="Druidrider: 0:31:53" alt="Druidrider" src="/img/x.gif" />                  </td>
                                    <td class="uniticon">
                        <img class="unit u26" title="Haeduan: 0:39:14" alt="Haeduan" src="/img/x.gif" />                    </td>
                                    <td class="uniticon">
                        <img class="unit u27" title="Ram: 2:07:30" alt="Ram" src="/img/x.gif" />                    </td>
                                    <td class="uniticon">
                        <img class="unit u28" title="Trebuchet: 2:50:00" alt="Trebuchet" src="/img/x.gif" />                    </td>
                                    <td class="uniticon">
                        <img class="unit u29" title="Chieftain: 1:42:00" alt="Chieftain" src="/img/x.gif" />                    </td>
                                    <td class="uniticon">
                        <img class="unit u30" title="Settler: 1:42:00" alt="Settler" src="/img/x.gif" />                    </td>
                                                    <td class="uniticon last">
                        <img class="unit uhero" title="Hero" alt="Hero" src="/img/x.gif" />                 </td>
                            </tr>
        </tbody>

        <tbody class="units last">
            <tr>
                <th>Troops</th>
                                                    <td class="unit none">
                                                    0                                           </td>
                                    <td class="unit none">
                                                    0                                           </td>
                                    <td class="unit none">
                                                    0                                           </td>
                                    <td class="unit">
                                                    500                                         </td>
                                    <td class="unit none">
                                                    0                                           </td>
                                    <td class="unit none">
                                                    0                                           </td>
                                    <td class="unit none">
                                                    0                                           </td>
                                    <td class="unit none">
                                                    0                                           </td>
                                    <td class="unit none">
                                                    0                                           </td>
                                    <td class="unit none">
                                                    0                                           </td>
                                    <td class="unit none last">
                                                    0                                           </td>
                            </tr>
        </tbody>

                    <tbody class="infos">
                <tr>
                    <th>Bounty</th>
                    <td colspan="11">
                        <div class="res">
                            <div class="inlineIconList resourceWrapper"><div class="inlineIcon resources" title="Lumber"><i class="r1"></i><span class="value ">6758</span></div><div class="inlineIcon resources" title="Clay"><i class="r2"></i><span class="value ">8093</span></div><div class="inlineIcon resources" title="Iron"><i class="r3"></i><span class="value ">6908</span></div><div class="inlineIcon resources" title="Crop"><i class="r4"></i><span class="value ">15741</span></div></div>                       </div>
                        <div class="carry">
                            <img class="carry full" title="carry"
                                 alt="carry"
                                 src="/img/x.gif"/> &#x202d;&#x202d;37500&#x202c;&nbsp;/&nbsp;&#x202d;37500&#x202c;&#x202c;                     </div>
                    </td>
                </tr>
            </tbody>
        
        <tbody class="infos">
            <tr>
                <th>Arrival</th>
                <td colspan="11">
                    <div class="in">in&nbsp;<span  class="timer" counting="down" value="85">0:01:25</span>&nbsp;hrs.</div>
                    <div class="at"><span>at&nbsp;00:43:10</span><span> </span></div>
                </td>
            </tr>
        </tbody>
    </table>
            <a name="at"></a>
    <table cellspacing="1" cellpadding="1" class="troop_details inReturn"
    >
        <thead>
            <tr>
                <td class="role">
                                            <a href="/karte.php?d=91628">01] #WorkInProgress</a>
                                    </td>
                <td colspan="11" class="troopHeadline">
                                                                <a href="/karte.php?d=94829">Return from 0-New Hulk</a>
                                    </td>
            </tr>
        </thead>
        <tbody class="units">
            <tr>
                <th class="coords">
                                            &#x202d;<span class="coordinates coordinatesWrapper coordinatesAligned coordinatesltr"><span class="coordinateX">(&#x202d;&minus;&#x202d;1&#x202c;&#x202c;</span><span class="coordinatePipe">|</span><span class="coordinateY">&#x202d;&minus;&#x202d;28&#x202c;&#x202c;)</span></span>&#x202c;                                    </th>
                                    <td class="uniticon">
                        <img class="unit u21" title="Phalanx: 0:45:33" alt="Phalanx" src="/img/x.gif" />                    </td>
                                    <td class="uniticon">
                        <img class="unit u22" title="Swordsman: 0:53:09" alt="Swordsman" src="/img/x.gif" />                    </td>
                                    <td class="uniticon">
                        <img class="unit u23" title="Pathfinder: 0:18:46" alt="Pathfinder" src="/img/x.gif" />                  </td>
                                    <td class="uniticon">
                        <img class="unit u24" title="Theutates Thunder: 0:16:47" alt="Theutates Thunder" src="/img/x.gif" />                    </td>
                                    <td class="uniticon">
                        <img class="unit u25" title="Druidrider: 0:19:56" alt="Druidrider" src="/img/x.gif" />                  </td>
                                    <td class="uniticon">
                        <img class="unit u26" title="Haeduan: 0:24:32" alt="Haeduan" src="/img/x.gif" />                    </td>
                                    <td class="uniticon">
                        <img class="unit u27" title="Ram: 1:19:44" alt="Ram" src="/img/x.gif" />                    </td>
                                    <td class="uniticon">
                        <img class="unit u28" title="Trebuchet: 1:46:18" alt="Trebuchet" src="/img/x.gif" />                    </td>
                                    <td class="uniticon">
                        <img class="unit u29" title="Chieftain: 1:03:47" alt="Chieftain" src="/img/x.gif" />                    </td>
                                    <td class="uniticon">
                        <img class="unit u30" title="Settler: 1:03:47" alt="Settler" src="/img/x.gif" />                    </td>
                                                    <td class="uniticon last">
                        <img class="unit uhero" title="Hero" alt="Hero" src="/img/x.gif" />                 </td>
                            </tr>
        </tbody>

        <tbody class="units last">
            <tr>
                <th>Troops</th>
                                                    <td class="unit none">
                                                    0                                           </td>
                                    <td class="unit none">
                                                    0                                           </td>
                                    <td class="unit none">
                                                    0                                           </td>
                                    <td class="unit">
                                                    400                                         </td>
                                    <td class="unit none">
                                                    0                                           </td>
                                    <td class="unit none">
                                                    0                                           </td>
                                    <td class="unit none">
                                                    0                                           </td>
                                    <td class="unit none">
                                                    0                                           </td>
                                    <td class="unit none">
                                                    0                                           </td>
                                    <td class="unit none">
                                                    0                                           </td>
                                    <td class="unit none last">
                                                    0                                           </td>
                            </tr>
        </tbody>

                    <tbody class="infos">
                <tr>
                    <th>Bounty</th>
                    <td colspan="11">
                        <div class="res">
                            <div class="inlineIconList resourceWrapper"><div class="inlineIcon resources" title="Lumber"><i class="r1"></i><span class="value ">6130</span></div><div class="inlineIcon resources" title="Clay"><i class="r2"></i><span class="value ">5835</span></div><div class="inlineIcon resources" title="Iron"><i class="r3"></i><span class="value ">5638</span></div><div class="inlineIcon resources" title="Crop"><i class="r4"></i><span class="value ">12397</span></div></div>                       </div>
                        <div class="carry">
                            <img class="carry full" title="carry"
                                 alt="carry"
                                 src="/img/x.gif"/> &#x202d;&#x202d;30000&#x202c;&nbsp;/&nbsp;&#x202d;30000&#x202c;&#x202c;                     </div>
                    </td>
                </tr>
            </tbody>
        
        <tbody class="infos">
            <tr>
                <th>Arrival</th>
                <td colspan="11">
                    <div class="in">in&nbsp;<span  class="timer" counting="down" value="920">0:15:20</span>&nbsp;hrs.</div>
                    <div class="at"><span>at&nbsp;00:57:05</span><span> </span></div>
                </td>
            </tr>
        </tbody>
    </table>

The data that interest me are the following:我感兴趣的数据如下:

  1. Return from 01-soldier 00:43:10从 01-士兵归来 00:43:10
  2. Return from 0-New Hulk 00:57:05从 0-新绿巨人归来 00:57:05

Thanks to your advice this is my code at the moment:感谢您的建议,这是我目前的代码:

  <?php include 'database.php' ?>
<?php session_start(); ?>

<?php
include_once('simple_html_dom.php');
$caserma = $_SESSION["caserma"];
    
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($_SESSION["caserma"], LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
$texts = [];
foreach ($xpath->query("//table[contains(@class, 'troop_details') and contains(@class, 'inReturn')]//td[@class='troopHeadline']//a[@href]/text()") as $textNode) {
    $texts[] = $textNode->nodeValue;
}
var_export($texts);
 ?>

But as output it gives me array ( )但是作为 output 它给了我数组()

Code assuming $_SESSION["caserma"] contains your full html document: ( Demo )假设$_SESSION["caserma"]包含您的完整 html 文档的代码:(演示

$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($_SESSION["caserma"]);
$xpath = new DOMXPath($dom);
$texts = [];
foreach ($xpath->query("//table[contains(@class, 'troop_details') and contains(@class, 'inReturn')]//td[@class='troopHeadline']//a[@href]/text()") as $textNode) {
    $texts[] = $textNode->nodeValue;
}
var_export($texts);

Output from your sample input: Output 来自您的样本输入:

array (
  0 => 'Return from 01-soldier',
  1 => 'Return from 0-New Hulk',
)

XPath Breakdown: XPath 故障:

//                                                                         # search to any depth in the document
table[contains(@class, 'troop_details') and contains(@class, 'inReturn')]  # find all table tags with both `troop_details` and `inReturn` classes
//                                                                         # continue searching any descendants of any matches
td[@class='troopHeadline']                                                 # match all td tags with `troopHeadline` as its class
//                                                                         # continue searching anydescendants of any matches
a[@href]                                                                   # match all a tags with an href attribute
/                                                                          # search the immediate descendant (any first generation child)
text()                                                                     # match the text of the parent a tag
  • libxml_use_internal_errors(true) is used to silence any potential errors from an "invalid" document. libxml_use_internal_errors(true)用于消除“无效”文档中的任何潜在错误。
  • It is important to use contains(...) and contains() in the xpath so that even if the class attributes change their order or new classes are added to the element, the xpath will still match correctly.在 xpath 中使用contains(...) and contains()非常重要,这样即使 class 属性更改其顺序或将新类添加到元素中,xpath 仍将正确匹配
  • The foreach() loop will iterate all qualifying text nodes. foreach()循环将迭代所有符合条件的文本节点。
  • Extract the nodeValue and push it into the result array.提取nodeValue并将其推送到结果数组中。

从多个中刮取相关的文本对<div id="text_translate"><p>我在数据库字段中有这个 html 源代码。 我想分析这段代码,特别是一些表格的字段,并将它们打印在屏幕上。 我无法发布所有代码,因为它超过 3000 行代码,这是代码的开头:</p><pre> &lt;.DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1:0 Transitional//EN" "http.//www.w3.org/TR/xhtml1/DTD/xhtml1-transitional:dtd"&gt; &lt;html xmlns="http.//www.w3;org/1999/xhtml" id="mainLayout"&gt; &lt;head&gt; &lt;title&gt;Anglosphere x3&lt;/title&gt; &lt;meta http-equiv="cache-control" content="max-age=0" /&gt; &lt;meta http-equiv="pragma" content="no-cache" /&gt; &lt;meta http-equiv="expires" content="0" /&gt; &lt;meta http-equiv="imagetoolbar" content="no" /&gt; &lt;meta http-equiv="content-type" content="text/html. charset=UTF-8" /&gt; &lt;meta name="content-language" content="en-US" /&gt; &lt;meta name="viewport" content="width=device-width"&gt; &lt;script data-cmp-ab="1"&gt;window:cmp_block_ignoredomains = ['recaptcha.net']&lt;/script&gt; &lt;script data-cmp-ab="1" src="https.//cdn.consentmanager.mgr.consensu.org/delivery/cookieblock.min:js" &gt;&lt;/script&gt; &lt;link rel="stylesheet" href="https.//cdn.consentmanager.mgr.consensu.org/delivery/cmp.min.css" /&gt;</pre><p> 这是关于我感兴趣的表格的代码:</p><pre> &lt;table cellspacing="1" cellpadding="1" class="troop_details inReturn" &gt; &lt;thead&gt; &lt;tr&gt; &lt;td class="role"&gt; &lt;a href="/karte.php?d=91628"&gt;01] #WorkInProgress&lt;/a&gt; &lt;/td&gt; &lt;td colspan="11" class="troopHeadline"&gt; &lt;a href="/karte.php?d=91611"&gt;Return from 01-soldier&lt;/a&gt; &lt;/td&gt; &lt;/tr&gt; &lt;/thead&gt; &lt;tbody class="units"&gt; &lt;tr&gt; &lt;th class="coords"&gt; &amp;#x202d;&lt;span class="coordinates coordinatesWrapper coordinatesAligned coordinatesltr"&gt;&lt;span class="coordinateX"&gt;(&amp;#x202d;&amp;minus;&amp;#x202d;1&amp;#x202c;&amp;#x202c;&lt;/span&gt;&lt;span class="coordinatePipe"&gt;|&lt;/span&gt;&lt;span class="coordinateY"&gt;&amp;#x202d;&amp;minus;&amp;#x202d;28&amp;#x202c;&amp;#x202c;)&lt;/span&gt;&lt;/span&gt;&amp;#x202c; &lt;/th&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u21" title="Phalanx: 1:12:51" alt="Phalanx" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u22" title="Swordsman: 1:25:00" alt="Swordsman" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u23" title="Pathfinder: 0:30:00" alt="Pathfinder" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u24" title="Theutates Thunder: 0:26:51" alt="Theutates Thunder" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u25" title="Druidrider: 0:31:53" alt="Druidrider" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u26" title="Haeduan: 0:39:14" alt="Haeduan" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u27" title="Ram: 2:07:30" alt="Ram" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u28" title="Trebuchet: 2:50:00" alt="Trebuchet" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u29" title="Chieftain: 1:42:00" alt="Chieftain" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u30" title="Settler: 1:42:00" alt="Settler" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon last"&gt; &lt;img class="unit uhero" title="Hero" alt="Hero" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;/tr&gt; &lt;/tbody&gt; &lt;tbody class="units last"&gt; &lt;tr&gt; &lt;th&gt;Troops&lt;/th&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit"&gt; 500 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none last"&gt; 0 &lt;/td&gt; &lt;/tr&gt; &lt;/tbody&gt; &lt;tbody class="infos"&gt; &lt;tr&gt; &lt;th&gt;Bounty&lt;/th&gt; &lt;td colspan="11"&gt; &lt;div class="res"&gt; &lt;div class="inlineIconList resourceWrapper"&gt;&lt;div class="inlineIcon resources" title="Lumber"&gt;&lt;i class="r1"&gt;&lt;/i&gt;&lt;span class="value "&gt;6758&lt;/span&gt;&lt;/div&gt;&lt;div class="inlineIcon resources" title="Clay"&gt;&lt;i class="r2"&gt;&lt;/i&gt;&lt;span class="value "&gt;8093&lt;/span&gt;&lt;/div&gt;&lt;div class="inlineIcon resources" title="Iron"&gt;&lt;i class="r3"&gt;&lt;/i&gt;&lt;span class="value "&gt;6908&lt;/span&gt;&lt;/div&gt;&lt;div class="inlineIcon resources" title="Crop"&gt;&lt;i class="r4"&gt;&lt;/i&gt;&lt;span class="value "&gt;15741&lt;/span&gt;&lt;/div&gt;&lt;/div&gt; &lt;/div&gt; &lt;div class="carry"&gt; &lt;img class="carry full" title="carry" alt="carry" src="/img/x.gif"/&gt; &amp;#x202d;&amp;#x202d;37500&amp;#x202c;&amp;nbsp;/&amp;nbsp;&amp;#x202d;37500&amp;#x202c;&amp;#x202c; &lt;/div&gt; &lt;/td&gt; &lt;/tr&gt; &lt;/tbody&gt; &lt;tbody class="infos"&gt; &lt;tr&gt; &lt;th&gt;Arrival&lt;/th&gt; &lt;td colspan="11"&gt; &lt;div class="in"&gt;in&amp;nbsp;&lt;span class="timer" counting="down" value="85"&gt;0:01:25&lt;/span&gt;&amp;nbsp;hrs.&lt;/div&gt; &lt;div class="at"&gt;&lt;span&gt;at&amp;nbsp;00:43:10&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/div&gt; &lt;/td&gt; &lt;/tr&gt; &lt;/tbody&gt; &lt;/table&gt; &lt;a name="at"&gt;&lt;/a&gt; &lt;table cellspacing="1" cellpadding="1" class="troop_details inReturn" &gt; &lt;thead&gt; &lt;tr&gt; &lt;td class="role"&gt; &lt;a href="/karte.php?d=91628"&gt;01] #WorkInProgress&lt;/a&gt; &lt;/td&gt; &lt;td colspan="11" class="troopHeadline"&gt; &lt;a href="/karte.php?d=94829"&gt;Return from 0-New Hulk&lt;/a&gt; &lt;/td&gt; &lt;/tr&gt; &lt;/thead&gt; &lt;tbody class="units"&gt; &lt;tr&gt; &lt;th class="coords"&gt; &amp;#x202d;&lt;span class="coordinates coordinatesWrapper coordinatesAligned coordinatesltr"&gt;&lt;span class="coordinateX"&gt;(&amp;#x202d;&amp;minus;&amp;#x202d;1&amp;#x202c;&amp;#x202c;&lt;/span&gt;&lt;span class="coordinatePipe"&gt;|&lt;/span&gt;&lt;span class="coordinateY"&gt;&amp;#x202d;&amp;minus;&amp;#x202d;28&amp;#x202c;&amp;#x202c;)&lt;/span&gt;&lt;/span&gt;&amp;#x202c; &lt;/th&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u21" title="Phalanx: 0:45:33" alt="Phalanx" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u22" title="Swordsman: 0:53:09" alt="Swordsman" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u23" title="Pathfinder: 0:18:46" alt="Pathfinder" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u24" title="Theutates Thunder: 0:16:47" alt="Theutates Thunder" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u25" title="Druidrider: 0:19:56" alt="Druidrider" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u26" title="Haeduan: 0:24:32" alt="Haeduan" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u27" title="Ram: 1:19:44" alt="Ram" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u28" title="Trebuchet: 1:46:18" alt="Trebuchet" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u29" title="Chieftain: 1:03:47" alt="Chieftain" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u30" title="Settler: 1:03:47" alt="Settler" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon last"&gt; &lt;img class="unit uhero" title="Hero" alt="Hero" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;/tr&gt; &lt;/tbody&gt; &lt;tbody class="units last"&gt; &lt;tr&gt; &lt;th&gt;Troops&lt;/th&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit"&gt; 400 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none last"&gt; 0 &lt;/td&gt; &lt;/tr&gt; &lt;/tbody&gt; &lt;tbody class="infos"&gt; &lt;tr&gt; &lt;th&gt;Bounty&lt;/th&gt; &lt;td colspan="11"&gt; &lt;div class="res"&gt; &lt;div class="inlineIconList resourceWrapper"&gt;&lt;div class="inlineIcon resources" title="Lumber"&gt;&lt;i class="r1"&gt;&lt;/i&gt;&lt;span class="value "&gt;6130&lt;/span&gt;&lt;/div&gt;&lt;div class="inlineIcon resources" title="Clay"&gt;&lt;i class="r2"&gt;&lt;/i&gt;&lt;span class="value "&gt;5835&lt;/span&gt;&lt;/div&gt;&lt;div class="inlineIcon resources" title="Iron"&gt;&lt;i class="r3"&gt;&lt;/i&gt;&lt;span class="value "&gt;5638&lt;/span&gt;&lt;/div&gt;&lt;div class="inlineIcon resources" title="Crop"&gt;&lt;i class="r4"&gt;&lt;/i&gt;&lt;span class="value "&gt;12397&lt;/span&gt;&lt;/div&gt;&lt;/div&gt; &lt;/div&gt; &lt;div class="carry"&gt; &lt;img class="carry full" title="carry" alt="carry" src="/img/x.gif"/&gt; &amp;#x202d;&amp;#x202d;30000&amp;#x202c;&amp;nbsp;/&amp;nbsp;&amp;#x202d;30000&amp;#x202c;&amp;#x202c; &lt;/div&gt; &lt;/td&gt; &lt;/tr&gt; &lt;/tbody&gt; &lt;tbody class="infos"&gt; &lt;tr&gt; &lt;th&gt;Arrival&lt;/th&gt; &lt;td colspan="11"&gt; &lt;div class="in"&gt;in&amp;nbsp;&lt;span class="timer" counting="down" value="920"&gt;0:15:20&lt;/span&gt;&amp;nbsp;hrs.&lt;/div&gt; &lt;div class="at"&gt;&lt;span&gt;at&amp;nbsp;00:57:05&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/div&gt; &lt;/td&gt; &lt;/tr&gt; &lt;/tbody&gt; &lt;/table&gt;</pre><p> 我感兴趣的数据如下:</p><ol><li> 从 01-士兵<strong>归来 00:43:10</strong></li><li> 从 0-新绿巨人<strong>归来 00:57:05</strong></li></ol><p> 这是我的代码,但它像 output 一样给出了一个空<strong>数组 ( )</strong> ;</p><pre> &lt;?php include_once('simple_html_dom.php'); $caserma = $_SESSION["caserma"]; $dom = new DOMDocument; libxml_use_internal_errors(true); $dom-&gt;loadHTML($_SESSION["caserma"]); $xpath = new DOMXPath($dom); $texts = []; foreach ($xpath-&gt;query("//table[contains(@class, 'troop_details') and contains(@class, 'inReturn')]//td[@class='troopHeadline']//a[@href]/text()") as $textNode) { $texts[] = $textNode-&gt;nodeValue; } var_export($texts); ?&gt;</pre><p> 我认为我的输入不是有效的 xml/html,所以我尝试查找如下错误:</p><pre> $object = simplexml_load_string($_SESSION["caserma"]); if ($object === false) { $errors = libxml_get_errors(); print_r($errors) }</pre><p> 这是我的 output:</p><blockquote><p> 数组 ( [0] =&gt; LibXMLError Object ( [level] =&gt; 3 [code] =&gt; 4 [column] =&gt; 1 [message] =&gt; 需要开始标签,'&lt;' not found [file] =&gt; [line] =&gt; 1))</p></blockquote><p> 我该如何解决?</p></div>html 文档中的标签<table> </table> - Scrape related pairs of text from multiple <table> tags in an html document

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从多个中刮取相关的文本对<div id="text_translate"><p>我在数据库字段中有这个 html 源代码。 我想分析这段代码,特别是一些表格的字段,并将它们打印在屏幕上。 我无法发布所有代码,因为它超过 3000 行代码,这是代码的开头:</p><pre> &lt;.DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1:0 Transitional//EN" "http.//www.w3.org/TR/xhtml1/DTD/xhtml1-transitional:dtd"&gt; &lt;html xmlns="http.//www.w3;org/1999/xhtml" id="mainLayout"&gt; &lt;head&gt; &lt;title&gt;Anglosphere x3&lt;/title&gt; &lt;meta http-equiv="cache-control" content="max-age=0" /&gt; &lt;meta http-equiv="pragma" content="no-cache" /&gt; &lt;meta http-equiv="expires" content="0" /&gt; &lt;meta http-equiv="imagetoolbar" content="no" /&gt; &lt;meta http-equiv="content-type" content="text/html. charset=UTF-8" /&gt; &lt;meta name="content-language" content="en-US" /&gt; &lt;meta name="viewport" content="width=device-width"&gt; &lt;script data-cmp-ab="1"&gt;window:cmp_block_ignoredomains = ['recaptcha.net']&lt;/script&gt; &lt;script data-cmp-ab="1" src="https.//cdn.consentmanager.mgr.consensu.org/delivery/cookieblock.min:js" &gt;&lt;/script&gt; &lt;link rel="stylesheet" href="https.//cdn.consentmanager.mgr.consensu.org/delivery/cmp.min.css" /&gt;</pre><p> 这是关于我感兴趣的表格的代码:</p><pre> &lt;table cellspacing="1" cellpadding="1" class="troop_details inReturn" &gt; &lt;thead&gt; &lt;tr&gt; &lt;td class="role"&gt; &lt;a href="/karte.php?d=91628"&gt;01] #WorkInProgress&lt;/a&gt; &lt;/td&gt; &lt;td colspan="11" class="troopHeadline"&gt; &lt;a href="/karte.php?d=91611"&gt;Return from 01-soldier&lt;/a&gt; &lt;/td&gt; &lt;/tr&gt; &lt;/thead&gt; &lt;tbody class="units"&gt; &lt;tr&gt; &lt;th class="coords"&gt; &amp;#x202d;&lt;span class="coordinates coordinatesWrapper coordinatesAligned coordinatesltr"&gt;&lt;span class="coordinateX"&gt;(&amp;#x202d;&amp;minus;&amp;#x202d;1&amp;#x202c;&amp;#x202c;&lt;/span&gt;&lt;span class="coordinatePipe"&gt;|&lt;/span&gt;&lt;span class="coordinateY"&gt;&amp;#x202d;&amp;minus;&amp;#x202d;28&amp;#x202c;&amp;#x202c;)&lt;/span&gt;&lt;/span&gt;&amp;#x202c; &lt;/th&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u21" title="Phalanx: 1:12:51" alt="Phalanx" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u22" title="Swordsman: 1:25:00" alt="Swordsman" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u23" title="Pathfinder: 0:30:00" alt="Pathfinder" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u24" title="Theutates Thunder: 0:26:51" alt="Theutates Thunder" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u25" title="Druidrider: 0:31:53" alt="Druidrider" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u26" title="Haeduan: 0:39:14" alt="Haeduan" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u27" title="Ram: 2:07:30" alt="Ram" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u28" title="Trebuchet: 2:50:00" alt="Trebuchet" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u29" title="Chieftain: 1:42:00" alt="Chieftain" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u30" title="Settler: 1:42:00" alt="Settler" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon last"&gt; &lt;img class="unit uhero" title="Hero" alt="Hero" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;/tr&gt; &lt;/tbody&gt; &lt;tbody class="units last"&gt; &lt;tr&gt; &lt;th&gt;Troops&lt;/th&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit"&gt; 500 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none last"&gt; 0 &lt;/td&gt; &lt;/tr&gt; &lt;/tbody&gt; &lt;tbody class="infos"&gt; &lt;tr&gt; &lt;th&gt;Bounty&lt;/th&gt; &lt;td colspan="11"&gt; &lt;div class="res"&gt; &lt;div class="inlineIconList resourceWrapper"&gt;&lt;div class="inlineIcon resources" title="Lumber"&gt;&lt;i class="r1"&gt;&lt;/i&gt;&lt;span class="value "&gt;6758&lt;/span&gt;&lt;/div&gt;&lt;div class="inlineIcon resources" title="Clay"&gt;&lt;i class="r2"&gt;&lt;/i&gt;&lt;span class="value "&gt;8093&lt;/span&gt;&lt;/div&gt;&lt;div class="inlineIcon resources" title="Iron"&gt;&lt;i class="r3"&gt;&lt;/i&gt;&lt;span class="value "&gt;6908&lt;/span&gt;&lt;/div&gt;&lt;div class="inlineIcon resources" title="Crop"&gt;&lt;i class="r4"&gt;&lt;/i&gt;&lt;span class="value "&gt;15741&lt;/span&gt;&lt;/div&gt;&lt;/div&gt; &lt;/div&gt; &lt;div class="carry"&gt; &lt;img class="carry full" title="carry" alt="carry" src="/img/x.gif"/&gt; &amp;#x202d;&amp;#x202d;37500&amp;#x202c;&amp;nbsp;/&amp;nbsp;&amp;#x202d;37500&amp;#x202c;&amp;#x202c; &lt;/div&gt; &lt;/td&gt; &lt;/tr&gt; &lt;/tbody&gt; &lt;tbody class="infos"&gt; &lt;tr&gt; &lt;th&gt;Arrival&lt;/th&gt; &lt;td colspan="11"&gt; &lt;div class="in"&gt;in&amp;nbsp;&lt;span class="timer" counting="down" value="85"&gt;0:01:25&lt;/span&gt;&amp;nbsp;hrs.&lt;/div&gt; &lt;div class="at"&gt;&lt;span&gt;at&amp;nbsp;00:43:10&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/div&gt; &lt;/td&gt; &lt;/tr&gt; &lt;/tbody&gt; &lt;/table&gt; &lt;a name="at"&gt;&lt;/a&gt; &lt;table cellspacing="1" cellpadding="1" class="troop_details inReturn" &gt; &lt;thead&gt; &lt;tr&gt; &lt;td class="role"&gt; &lt;a href="/karte.php?d=91628"&gt;01] #WorkInProgress&lt;/a&gt; &lt;/td&gt; &lt;td colspan="11" class="troopHeadline"&gt; &lt;a href="/karte.php?d=94829"&gt;Return from 0-New Hulk&lt;/a&gt; &lt;/td&gt; &lt;/tr&gt; &lt;/thead&gt; &lt;tbody class="units"&gt; &lt;tr&gt; &lt;th class="coords"&gt; &amp;#x202d;&lt;span class="coordinates coordinatesWrapper coordinatesAligned coordinatesltr"&gt;&lt;span class="coordinateX"&gt;(&amp;#x202d;&amp;minus;&amp;#x202d;1&amp;#x202c;&amp;#x202c;&lt;/span&gt;&lt;span class="coordinatePipe"&gt;|&lt;/span&gt;&lt;span class="coordinateY"&gt;&amp;#x202d;&amp;minus;&amp;#x202d;28&amp;#x202c;&amp;#x202c;)&lt;/span&gt;&lt;/span&gt;&amp;#x202c; &lt;/th&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u21" title="Phalanx: 0:45:33" alt="Phalanx" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u22" title="Swordsman: 0:53:09" alt="Swordsman" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u23" title="Pathfinder: 0:18:46" alt="Pathfinder" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u24" title="Theutates Thunder: 0:16:47" alt="Theutates Thunder" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u25" title="Druidrider: 0:19:56" alt="Druidrider" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u26" title="Haeduan: 0:24:32" alt="Haeduan" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u27" title="Ram: 1:19:44" alt="Ram" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u28" title="Trebuchet: 1:46:18" alt="Trebuchet" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u29" title="Chieftain: 1:03:47" alt="Chieftain" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon"&gt; &lt;img class="unit u30" title="Settler: 1:03:47" alt="Settler" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;td class="uniticon last"&gt; &lt;img class="unit uhero" title="Hero" alt="Hero" src="/img/x.gif" /&gt; &lt;/td&gt; &lt;/tr&gt; &lt;/tbody&gt; &lt;tbody class="units last"&gt; &lt;tr&gt; &lt;th&gt;Troops&lt;/th&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit"&gt; 400 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none"&gt; 0 &lt;/td&gt; &lt;td class="unit none last"&gt; 0 &lt;/td&gt; &lt;/tr&gt; &lt;/tbody&gt; &lt;tbody class="infos"&gt; &lt;tr&gt; &lt;th&gt;Bounty&lt;/th&gt; &lt;td colspan="11"&gt; &lt;div class="res"&gt; &lt;div class="inlineIconList resourceWrapper"&gt;&lt;div class="inlineIcon resources" title="Lumber"&gt;&lt;i class="r1"&gt;&lt;/i&gt;&lt;span class="value "&gt;6130&lt;/span&gt;&lt;/div&gt;&lt;div class="inlineIcon resources" title="Clay"&gt;&lt;i class="r2"&gt;&lt;/i&gt;&lt;span class="value "&gt;5835&lt;/span&gt;&lt;/div&gt;&lt;div class="inlineIcon resources" title="Iron"&gt;&lt;i class="r3"&gt;&lt;/i&gt;&lt;span class="value "&gt;5638&lt;/span&gt;&lt;/div&gt;&lt;div class="inlineIcon resources" title="Crop"&gt;&lt;i class="r4"&gt;&lt;/i&gt;&lt;span class="value "&gt;12397&lt;/span&gt;&lt;/div&gt;&lt;/div&gt; &lt;/div&gt; &lt;div class="carry"&gt; &lt;img class="carry full" title="carry" alt="carry" src="/img/x.gif"/&gt; &amp;#x202d;&amp;#x202d;30000&amp;#x202c;&amp;nbsp;/&amp;nbsp;&amp;#x202d;30000&amp;#x202c;&amp;#x202c; &lt;/div&gt; &lt;/td&gt; &lt;/tr&gt; &lt;/tbody&gt; &lt;tbody class="infos"&gt; &lt;tr&gt; &lt;th&gt;Arrival&lt;/th&gt; &lt;td colspan="11"&gt; &lt;div class="in"&gt;in&amp;nbsp;&lt;span class="timer" counting="down" value="920"&gt;0:15:20&lt;/span&gt;&amp;nbsp;hrs.&lt;/div&gt; &lt;div class="at"&gt;&lt;span&gt;at&amp;nbsp;00:57:05&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/div&gt; &lt;/td&gt; &lt;/tr&gt; &lt;/tbody&gt; &lt;/table&gt;</pre><p> 我感兴趣的数据如下:</p><ol><li> 从 01-士兵<strong>归来 00:43:10</strong></li><li> 从 0-新绿巨人<strong>归来 00:57:05</strong></li></ol><p> 这是我的代码,但它像 output 一样给出了一个空<strong>数组 ( )</strong> ;</p><pre> &lt;?php include_once('simple_html_dom.php'); $caserma = $_SESSION["caserma"]; $dom = new DOMDocument; libxml_use_internal_errors(true); $dom-&gt;loadHTML($_SESSION["caserma"]); $xpath = new DOMXPath($dom); $texts = []; foreach ($xpath-&gt;query("//table[contains(@class, 'troop_details') and contains(@class, 'inReturn')]//td[@class='troopHeadline']//a[@href]/text()") as $textNode) { $texts[] = $textNode-&gt;nodeValue; } var_export($texts); ?&gt;</pre><p> 我认为我的输入不是有效的 xml/html,所以我尝试查找如下错误:</p><pre> $object = simplexml_load_string($_SESSION["caserma"]); if ($object === false) { $errors = libxml_get_errors(); print_r($errors) }</pre><p> 这是我的 output:</p><blockquote><p> 数组 ( [0] =&gt; LibXMLError Object ( [level] =&gt; 3 [code] =&gt; 4 [column] =&gt; 1 [message] =&gt; 需要开始标签,'&lt;' not found [file] =&gt; [line] =&gt; 1))</p></blockquote><p> 我该如何解决?</p></div>html 文档中的标签<table> </table> - Scrape related pairs of text from multiple <table> tags in an html document 如何从MySQL表中提取多个HTML标记 - How can I extract multiple HTML tags from a MySQL table 从rss feed中的html标签提取文本 - Extract text from html tags in an rss feed 从标签内部提取 html 文本 - extract html text from inside of the tag 如何<a>从抓取的html中从具有特定类值</a>的<a>标签中</a>提取href,标题和文本数据<a>?</a> - How to extract href, title, and text data from an <a> tag with a specific class value from scraped html? 在HTML表格中刮取特定的<td> - Scrape specific <td> in HTML table 正则表达式匹配html标签之外的文本,而不是特定标签之间的文本 - Regular expression to match text outside html tags and not between specific tag 在下载网站上抓取页面以提取特定的URL - Scrape page on download site to extract specific URLs 正则表达式从中提取特定的网址<img> HTML 文档中的标签 - Regex to extract specific urls from <img> tags in an HTML document PHP从html文件中的特定标签之间提取数据 - PHP Extract data between specific tags from an html file
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM