繁体   English   中英

PHP正则表达式

[英]PHP Regular Expression

我有3个消息块。

例:

<!-- message -->
    <div>
        Just the text.
    </div>
<!-- / message -->

<!-- message -->
    <div>
        <div style="margin-left: 20px; margin-top:5px; ">
            <div class="smallfont">Quote:</div>
        </div>
        <div style="margin-right: 20px; margin-left: 20px; padding: 10px;">
            Message from <strong>Nickname</strong> &nbsp;
                <div style="font-style:italic">Hello. It's a quote</div>
                <else /></if>
        </div>
        <br /><br />
        It's the simple text
    </div>
<!-- / message -->

<!-- message -->
    <div>
        Text<br />
        <div style="margin:20px; margin-top:5px; background-color: #30333D">
            <div class="smallfont" style="margin-bottom:2px">PHP code:</div>
            <div class="alt2" style="margin:0px; padding:6px; border:1px inset; width:640px; height:482px; overflow:auto; background-color:#FFFACA;">
                <code style="white-space:nowrap">
                    <div dir="ltr" style="text-align:left">
                        <!-- php buffer start -->
                            <code>
                                LALALA PHP CODE
                            </code>
                        <!-- php buffer end -->
                    </div>
                </code>
            </div>
        </div><br />
        <br />
        More text
    </div>
<!-- / message -->

我正在尝试为这些块制作一个正则表达式,但是不起作用。

preg_match('#<!-- message -->(?P<text>.*?)</div>.*?<!-- / message -->#is', $str, $s);

它仅适用于第一个块。

如何使正则表达式检查消息或php代码中是否有引号?

(?P<text>.*?) for text

(?P<phpcode>.*?) for php code

(?P<quotenickname>.*?) for quoted nickname

(?P<quotemessage>.*?) for quote message

等等...

非常感谢!!!!

对onteria_的更改

<!-- message -->
    <div>
        Just the text. <b>bold text</b><br/>
        <a href="link">link</a>, <s><i>test</i></s>        
    </div>
<!-- / message -->

输出:

Just the text
,

我需要解决的结论是什么,以及“ a”,“ b”,“ s”,“ i”等。如何确保未删除html? 谢谢

注意到那些关于不使用正则表达式的答复? 这是为什么? 那是因为HTML代表结构。 说实话,HTML代码会过度使用div而不是使用语义标记,但是无论如何我将使用DOM函数进行解析 因此,这是我使用的示例HTML:

<html>
<body>
<!-- message -->
    <div>
        Just the text.
    </div>
<!-- / message -->

<!-- message -->
    <div>
        <div style="margin-left: 20px; margin-top:5px; ">
            <div class="smallfont">Quote:</div>
        </div>
        <div style="margin-right: 20px; margin-left: 20px; padding: 10px;">
            Message from <strong>Nickname</strong> &nbsp;
                <div style="font-style:italic">Hello. It's a quote</div>
        </div>
        <br /><br />
        It's the simple text
    </div>
<!-- / message -->

<!-- message -->
    <div>
        Text<br />
        <div style="margin:20px; margin-top:5px; background-color: #30333D">
            <div class="smallfont" style="margin-bottom:2px">PHP code:</div>
            <div class="alt2" style="margin:0px; padding:6px; border:1px inset; width:640px; height:482px; overflow:auto; background-color:#FFFACA;">
                <code style="white-space:nowrap">
                    <div dir="ltr" style="text-align:left">
                        <!-- php buffer start -->
                            <code>
                                LALALA PHP CODE
                            </code>
                        <!-- php buffer end -->
                    </div>
                </code>
            </div>
        </div><br />
        <br />
        More text
    </div>
<!-- / message -->
</body>
</html>

现在获取完整代码:

$doc = new DOMDocument();
$doc->loadHTMLFile('test.html');


// These just  make the code nicer
// We could just inline them if we wanted to
// ----------- Helper Functions ------------
function HasQuote($part, $xpath) {
  // check the div and see if it contains "Quote:" inside
  return $xpath->query("div[contains(.,'Quote:')]", $part)->length;
}

function HasPHPCode($part, $xpath) {
  // check the div and see if it contains "PHP code:" inside
  return $xpath->query("div[contains(.,'PHP code:')]", $part)->length;
}
// ----------- End Helper Functions ------------


// ----------- Parse Functions ------------
function ParseQuote($quote, $xpath) {
  // The quote content is actually the next
  // next div over. Man this markup is weird.
  $quote = $quote->nextSibling->nextSibling;

  $quote_info = array('type' => 'quote');

  $nickname = $xpath->query("strong", $quote);
  if($nickname->length) {
    $quote_info['nickname'] = $nickname->item(0)->nodeValue;
  }

  $quote_text = $xpath->query("div", $quote);
  if($quote_text->length) {
    $quote_info['quote_text'] = trim($quote_text->item(0)->nodeValue);
  }

  return $quote_info;
}

function ParseCode($code, $xpath) {
  $code_info = array('type' => 'code');

  // This matches the path to get down to inner most code element
  $code_text = $xpath->query("//div/code/div/code", $code);
  if($code_text->length) {
    $code_info['code_text'] = trim($code_text->item(0)->nodeValue);
  }

  return $code_info;
}

// ----------- End Parser Functions ------------

function GetMessages($message, $xpath) {

  $message_contents = array();

  foreach($message->childNodes as $child) {

    // So inside of a message if we hit a div
    // We either have a Quote or PHP code, check which
    if(strtolower($child->nodeName) == 'div') {
      if(HasQuote($child, $xpath)) {
    $quote = ParseQuote($child, $xpath);
    if($quote['quote_text']) {
      $message_contents[] = $quote;
    }
      }
      else if(HasPHPCode($child, $xpath)) {
    $phpcode = ParseCode($child, $xpath);
    if($phpcode['code_text']) {
      $message_contents[] = $phpcode;
    }
      }
    }
    // Otherwise check if we've found some pretty text
    else if ($child->nodeType == XML_TEXT_NODE) {
      // This might be just whitespace, so check that it's not empty
      $text = trim($child->nodeValue);
      if($text) {
    $message_contents[] = array('type' => 'text', 'text' => trim($child->nodeValue));
      }
    }

  }

  return $message_contents;
}

$xpath = new DOMXpath($doc);
// We need to get the toplevel divs, which
// are the messages
$toplevel_divs = $xpath->query("//body/div");

$messages = array();
foreach($toplevel_divs as $toplevel_div) {
  $messages[] = GetMessages($toplevel_div, $xpath);
}

现在,让我们看看$messages是什么样的:

Array
(
    [0] => Array
        (
            [0] => Array
                (
                    [type] => text
                    [text] => Just the text.
                )

        )

    [1] => Array
        (
            [0] => Array
                (
                    [type] => quote
                    [nickname] => Nickname
                    [quote_text] => Hello. It's a quote
                )

            [1] => Array
                (
                    [type] => text
                    [text] => It's the simple text
                )

        )

    [2] => Array
        (
            [0] => Array
                (
                    [type] => text
                    [text] => Text
                )

            [1] => Array
                (
                    [type] => code
                    [code_text] => LALALA PHP CODE
                )

            [2] => Array
                (
                    [type] => text
                    [text] => More text
                )

        )

)

它被消息分开,然后进一步分成消息中的不同内容! 现在,我们甚至可以使用如下基本打印功能:

foreach($messages as $message) {
  echo "\n\n>>>>>> Message >>>>>>>\n";
  foreach($message as $content) {
    if($content['type'] == 'text') {
      echo "{$content['text']} ";
    }
    else if($content['type'] == 'quote') {
      echo "\n\n======== Quote =========\n";
      echo "From: {$content['nickname']}\n\n";
      echo "{$content['quote_text']}\n";
      echo "=====================\n\n";
    }
    else if($content['type'] == 'code') {
      echo "\n\n======== Code =========\n";
      echo "{$content['code_text']}\n";
      echo "=====================\n\n";
    }
  }
}

echo "\n";

我们得到了!

>>>>>> Message >>>>>>>
Just the text. 

>>>>>> Message >>>>>>>


======== Quote =========
From: Nickname

Hello. It's a quote
=====================

It's the simple text 

>>>>>> Message >>>>>>>
Text 

======== Code =========
LALALA PHP CODE
=====================

More text 

因为DOM解析功能能够理解结构,所以所有这些再次起作用。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM