[英]PHP Regular Expression
我有3个消息块。
例:
<!-- message -->
<div>
Just the text.
</div>
<!-- / message -->
<!-- message -->
<div>
<div style="margin-left: 20px; margin-top:5px; ">
<div class="smallfont">Quote:</div>
</div>
<div style="margin-right: 20px; margin-left: 20px; padding: 10px;">
Message from <strong>Nickname</strong>
<div style="font-style:italic">Hello. It's a quote</div>
<else /></if>
</div>
<br /><br />
It's the simple text
</div>
<!-- / message -->
<!-- message -->
<div>
Text<br />
<div style="margin:20px; margin-top:5px; background-color: #30333D">
<div class="smallfont" style="margin-bottom:2px">PHP code:</div>
<div class="alt2" style="margin:0px; padding:6px; border:1px inset; width:640px; height:482px; overflow:auto; background-color:#FFFACA;">
<code style="white-space:nowrap">
<div dir="ltr" style="text-align:left">
<!-- php buffer start -->
<code>
LALALA PHP CODE
</code>
<!-- php buffer end -->
</div>
</code>
</div>
</div><br />
<br />
More text
</div>
<!-- / message -->
我正在尝试为这些块制作一个正则表达式,但是不起作用。
preg_match('#<!-- message -->(?P<text>.*?)</div>.*?<!-- / message -->#is', $str, $s);
它仅适用于第一个块。
如何使正则表达式检查消息或php代码中是否有引号?
(?P<text>.*?) for text
(?P<phpcode>.*?) for php code
(?P<quotenickname>.*?) for quoted nickname
(?P<quotemessage>.*?) for quote message
等等...
非常感谢!!!!
对onteria_的更改
<!-- message -->
<div>
Just the text. <b>bold text</b><br/>
<a href="link">link</a>, <s><i>test</i></s>
</div>
<!-- / message -->
输出:
Just the text
,
我需要解决的结论是什么,以及“ a”,“ b”,“ s”,“ i”等。如何确保未删除html? 谢谢
注意到那些关于不使用正则表达式的答复? 这是为什么? 那是因为HTML代表结构。 说实话,HTML代码会过度使用div而不是使用语义标记,但是无论如何我将使用DOM函数进行解析 。 因此,这是我使用的示例HTML:
<html>
<body>
<!-- message -->
<div>
Just the text.
</div>
<!-- / message -->
<!-- message -->
<div>
<div style="margin-left: 20px; margin-top:5px; ">
<div class="smallfont">Quote:</div>
</div>
<div style="margin-right: 20px; margin-left: 20px; padding: 10px;">
Message from <strong>Nickname</strong>
<div style="font-style:italic">Hello. It's a quote</div>
</div>
<br /><br />
It's the simple text
</div>
<!-- / message -->
<!-- message -->
<div>
Text<br />
<div style="margin:20px; margin-top:5px; background-color: #30333D">
<div class="smallfont" style="margin-bottom:2px">PHP code:</div>
<div class="alt2" style="margin:0px; padding:6px; border:1px inset; width:640px; height:482px; overflow:auto; background-color:#FFFACA;">
<code style="white-space:nowrap">
<div dir="ltr" style="text-align:left">
<!-- php buffer start -->
<code>
LALALA PHP CODE
</code>
<!-- php buffer end -->
</div>
</code>
</div>
</div><br />
<br />
More text
</div>
<!-- / message -->
</body>
</html>
现在获取完整代码:
$doc = new DOMDocument();
$doc->loadHTMLFile('test.html');
// These just make the code nicer
// We could just inline them if we wanted to
// ----------- Helper Functions ------------
function HasQuote($part, $xpath) {
// check the div and see if it contains "Quote:" inside
return $xpath->query("div[contains(.,'Quote:')]", $part)->length;
}
function HasPHPCode($part, $xpath) {
// check the div and see if it contains "PHP code:" inside
return $xpath->query("div[contains(.,'PHP code:')]", $part)->length;
}
// ----------- End Helper Functions ------------
// ----------- Parse Functions ------------
function ParseQuote($quote, $xpath) {
// The quote content is actually the next
// next div over. Man this markup is weird.
$quote = $quote->nextSibling->nextSibling;
$quote_info = array('type' => 'quote');
$nickname = $xpath->query("strong", $quote);
if($nickname->length) {
$quote_info['nickname'] = $nickname->item(0)->nodeValue;
}
$quote_text = $xpath->query("div", $quote);
if($quote_text->length) {
$quote_info['quote_text'] = trim($quote_text->item(0)->nodeValue);
}
return $quote_info;
}
function ParseCode($code, $xpath) {
$code_info = array('type' => 'code');
// This matches the path to get down to inner most code element
$code_text = $xpath->query("//div/code/div/code", $code);
if($code_text->length) {
$code_info['code_text'] = trim($code_text->item(0)->nodeValue);
}
return $code_info;
}
// ----------- End Parser Functions ------------
function GetMessages($message, $xpath) {
$message_contents = array();
foreach($message->childNodes as $child) {
// So inside of a message if we hit a div
// We either have a Quote or PHP code, check which
if(strtolower($child->nodeName) == 'div') {
if(HasQuote($child, $xpath)) {
$quote = ParseQuote($child, $xpath);
if($quote['quote_text']) {
$message_contents[] = $quote;
}
}
else if(HasPHPCode($child, $xpath)) {
$phpcode = ParseCode($child, $xpath);
if($phpcode['code_text']) {
$message_contents[] = $phpcode;
}
}
}
// Otherwise check if we've found some pretty text
else if ($child->nodeType == XML_TEXT_NODE) {
// This might be just whitespace, so check that it's not empty
$text = trim($child->nodeValue);
if($text) {
$message_contents[] = array('type' => 'text', 'text' => trim($child->nodeValue));
}
}
}
return $message_contents;
}
$xpath = new DOMXpath($doc);
// We need to get the toplevel divs, which
// are the messages
$toplevel_divs = $xpath->query("//body/div");
$messages = array();
foreach($toplevel_divs as $toplevel_div) {
$messages[] = GetMessages($toplevel_div, $xpath);
}
现在,让我们看看$messages
是什么样的:
Array
(
[0] => Array
(
[0] => Array
(
[type] => text
[text] => Just the text.
)
)
[1] => Array
(
[0] => Array
(
[type] => quote
[nickname] => Nickname
[quote_text] => Hello. It's a quote
)
[1] => Array
(
[type] => text
[text] => It's the simple text
)
)
[2] => Array
(
[0] => Array
(
[type] => text
[text] => Text
)
[1] => Array
(
[type] => code
[code_text] => LALALA PHP CODE
)
[2] => Array
(
[type] => text
[text] => More text
)
)
)
它被消息分开,然后进一步分成消息中的不同内容! 现在,我们甚至可以使用如下基本打印功能:
foreach($messages as $message) {
echo "\n\n>>>>>> Message >>>>>>>\n";
foreach($message as $content) {
if($content['type'] == 'text') {
echo "{$content['text']} ";
}
else if($content['type'] == 'quote') {
echo "\n\n======== Quote =========\n";
echo "From: {$content['nickname']}\n\n";
echo "{$content['quote_text']}\n";
echo "=====================\n\n";
}
else if($content['type'] == 'code') {
echo "\n\n======== Code =========\n";
echo "{$content['code_text']}\n";
echo "=====================\n\n";
}
}
}
echo "\n";
我们得到了!
>>>>>> Message >>>>>>>
Just the text.
>>>>>> Message >>>>>>>
======== Quote =========
From: Nickname
Hello. It's a quote
=====================
It's the simple text
>>>>>> Message >>>>>>>
Text
======== Code =========
LALALA PHP CODE
=====================
More text
因为DOM解析功能能够理解结构,所以所有这些再次起作用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.