PHP - 在HTML標簽中查找單詞

Question

尋找在一些隨機HTML片段中獲取某些HTML文本內容的最佳方法

我似乎無法弄清楚它的正則表達式。

<td valign="top" style="border: solid 1px black; padding: 4px;">
    <h4>Dec 05, 2015 23:16:52</h4>
    <h3>rron7pam has won</h3>
</td>

<table width="100%" style="border: 1px solid #DED3B9" id="attack_info_att">
    <tbody>
        <tr>
            <th style="width:20%">Attacker:</th>
            <th><a title="..." href="/guest.php?screen=info_player&amp;id=255995">Bliksem</a></th>
        </tr>
    </tbody>
</table>

以上只是示例，但對於這些示例，我感興趣

獲取日期（日期= 2015年12月5日23:16:52）
誰贏了這場戰斗（name = rron7pam）
攻擊者的名字（姓名= Bliksem）
攻擊者的身份證（id = 255995）

我需要從單獨的HTML代碼中獲得更多信息，但如果我能得到一兩個，我可能會得到更多。

編輯基於評論和答案： HTML中可能有任意文本，具體取決於報告的設置方式（隱藏攻擊者的單位等）我需要查找特定HTML標記的模式

在上面的示例中，“直接在<td> ”內的一組<h3></h3>標記之后的<h4></h4>標記之間的文本將是我需要的日期。

一些不同格式的鏈接示例：

https://enp2.tribalwars.net/public_report/70d3a2a55461e9eb09f543958b608304 https://enp2.tribalwars.net/public_report/5216e0e16c9d3657f981ce7e3cb02580

據我所知，有些元素總是相同的，例如，按照上面的說明來獲取日期。

Answer 1

DOMDocument一個例子：

$url = 'https://enp2.tribalwars.net/public_report/70d3a2a55461e9eb09f543958b608304';

// prevent warnings to be displayed
libxml_use_internal_errors(true);

$dom = new DOMDocument;
$dom->loadHTMLFile($url);

$xp = new DOMXPath($dom);

# lets find interesting nodes:

// td that contains all the needed informations (the nearest common ancestor in other words)
$rootNode = $xp->query('(//table[@class="vis"]/tr/td[./h4])[1]')->item(0);

// first h4 node that contains the date
$dateNode = $xp->query('(./h4)[1]', $rootNode)->item(0);

// following h3 node that contains the player name
$winnerNode = $xp->query('(./following-sibling::h3)[1]', $dateNode)->item(0);

$attackerNode = $xp->query('(./table[@id="attack_info_att"]/tr/th/a)[1]', $rootNode)->item(0);

# extract special values

$winner = preg_replace('~ has won$~', '', $winnerNode->nodeValue);

$attackerID = html_entity_decode($attackerNode->getAttribute('href'));
$attackerID = parse_url($attackerID, PHP_URL_QUERY);
parse_str($attackerID, $queryVars);
$attackerID = $queryVars['id'];

$result = [ 'date' => $dateNode->nodeValue,
            'winner' => $winner,
            'attacker' => $attackerNode->nodeValue,
            'attackerID' => $attackerID ];

print_r($result);

Answer 2

它不會很漂亮，但你可以使用strpos返回標簽/內容的開始和結束位置。 然后使用substr返回字符串的那部分。

string substr ( string $string , int $start [, int $length ] )

mixed strpos ( string $haystack , mixed $needle [, int $offset = 0 ] )

我會說，必須這樣做可能意味着你接收數據/進一步的方式有問題。 我真的很想，一遍又一遍地掃描dom會很有效率。

PHP - 在HTML標簽中查找單詞

問題描述

2 個解決方案

解決方案1
3 已采納 2015-12-06 13:31:52

解決方案2
0 2015-12-06 10:18:26

PHP - 在HTML標簽中查找單詞

問題描述

2 個解決方案

解決方案1 3 已采納 2015-12-06 13:31:52

解決方案2 0 2015-12-06 10:18:26

解決方案1
3 已采納 2015-12-06 13:31:52

解決方案2
0 2015-12-06 10:18:26