简体   繁体   English

PHP正则表达式

[英]PHP Regular Expression

I have 3 message blocks. 我有3个消息块。

Example: 例:

<!-- message -->
    <div>
        Just the text.
    </div>
<!-- / message -->

<!-- message -->
    <div>
        <div style="margin-left: 20px; margin-top:5px; ">
            <div class="smallfont">Quote:</div>
        </div>
        <div style="margin-right: 20px; margin-left: 20px; padding: 10px;">
            Message from <strong>Nickname</strong> &nbsp;
                <div style="font-style:italic">Hello. It's a quote</div>
                <else /></if>
        </div>
        <br /><br />
        It's the simple text
    </div>
<!-- / message -->

<!-- message -->
    <div>
        Text<br />
        <div style="margin:20px; margin-top:5px; background-color: #30333D">
            <div class="smallfont" style="margin-bottom:2px">PHP code:</div>
            <div class="alt2" style="margin:0px; padding:6px; border:1px inset; width:640px; height:482px; overflow:auto; background-color:#FFFACA;">
                <code style="white-space:nowrap">
                    <div dir="ltr" style="text-align:left">
                        <!-- php buffer start -->
                            <code>
                                LALALA PHP CODE
                            </code>
                        <!-- php buffer end -->
                    </div>
                </code>
            </div>
        </div><br />
        <br />
        More text
    </div>
<!-- / message -->

I'm trying to make a regular expression for these blocks, but does not work. 我正在尝试为这些块制作一个正则表达式,但是不起作用。

preg_match('#<!-- message -->(?P<text>.*?)</div>.*?<!-- / message -->#is', $str, $s);

It works only for first block.. 它仅适用于第一个块。

How to make it so that the regular expression checks whether there is a quote in a message or php code? 如何使正则表达式检查消息或php代码中是否有引号?

(?P<text>.*?) for text

(?P<phpcode>.*?) for php code

(?P<quotenickname>.*?) for quoted nickname

(?P<quotemessage>.*?) for quote message

and etc... 等等...

Thank you so much!!!! 非常感谢!!!!

CHANGES FOR onteria_ 对onteria_的更改

<!-- message -->
    <div>
        Just the text. <b>bold text</b><br/>
        <a href="link">link</a>, <s><i>test</i></s>        
    </div>
<!-- / message -->

Output: 输出:

Just the text
,

What do I need to fix that conclusion was, along with "a", "b", "s", "i", and etc.. How to make sure that html was not removed? 我需要解决的结论是什么,以及“ a”,“ b”,“ s”,“ i”等。如何确保未删除html? Thank you 谢谢

Notices those responses about not using regex? 注意到那些关于不使用正则表达式的答复? Why is that? 这是为什么? Well that's because HTML represents structure. 那是因为HTML代表结构。 Thought to be honest that HTML code overuses divs instead of using semantic markup but I'm going to parse it anyways with DOM functions . 说实话,HTML代码会过度使用div而不是使用语义标记,但是无论如何我将使用DOM函数进行解析 So then, here's the sample HTML I used: 因此,这是我使用的示例HTML:

<html>
<body>
<!-- message -->
    <div>
        Just the text.
    </div>
<!-- / message -->

<!-- message -->
    <div>
        <div style="margin-left: 20px; margin-top:5px; ">
            <div class="smallfont">Quote:</div>
        </div>
        <div style="margin-right: 20px; margin-left: 20px; padding: 10px;">
            Message from <strong>Nickname</strong> &nbsp;
                <div style="font-style:italic">Hello. It's a quote</div>
        </div>
        <br /><br />
        It's the simple text
    </div>
<!-- / message -->

<!-- message -->
    <div>
        Text<br />
        <div style="margin:20px; margin-top:5px; background-color: #30333D">
            <div class="smallfont" style="margin-bottom:2px">PHP code:</div>
            <div class="alt2" style="margin:0px; padding:6px; border:1px inset; width:640px; height:482px; overflow:auto; background-color:#FFFACA;">
                <code style="white-space:nowrap">
                    <div dir="ltr" style="text-align:left">
                        <!-- php buffer start -->
                            <code>
                                LALALA PHP CODE
                            </code>
                        <!-- php buffer end -->
                    </div>
                </code>
            </div>
        </div><br />
        <br />
        More text
    </div>
<!-- / message -->
</body>
</html>

Now for the full code: 现在获取完整代码:

$doc = new DOMDocument();
$doc->loadHTMLFile('test.html');


// These just  make the code nicer
// We could just inline them if we wanted to
// ----------- Helper Functions ------------
function HasQuote($part, $xpath) {
  // check the div and see if it contains "Quote:" inside
  return $xpath->query("div[contains(.,'Quote:')]", $part)->length;
}

function HasPHPCode($part, $xpath) {
  // check the div and see if it contains "PHP code:" inside
  return $xpath->query("div[contains(.,'PHP code:')]", $part)->length;
}
// ----------- End Helper Functions ------------


// ----------- Parse Functions ------------
function ParseQuote($quote, $xpath) {
  // The quote content is actually the next
  // next div over. Man this markup is weird.
  $quote = $quote->nextSibling->nextSibling;

  $quote_info = array('type' => 'quote');

  $nickname = $xpath->query("strong", $quote);
  if($nickname->length) {
    $quote_info['nickname'] = $nickname->item(0)->nodeValue;
  }

  $quote_text = $xpath->query("div", $quote);
  if($quote_text->length) {
    $quote_info['quote_text'] = trim($quote_text->item(0)->nodeValue);
  }

  return $quote_info;
}

function ParseCode($code, $xpath) {
  $code_info = array('type' => 'code');

  // This matches the path to get down to inner most code element
  $code_text = $xpath->query("//div/code/div/code", $code);
  if($code_text->length) {
    $code_info['code_text'] = trim($code_text->item(0)->nodeValue);
  }

  return $code_info;
}

// ----------- End Parser Functions ------------

function GetMessages($message, $xpath) {

  $message_contents = array();

  foreach($message->childNodes as $child) {

    // So inside of a message if we hit a div
    // We either have a Quote or PHP code, check which
    if(strtolower($child->nodeName) == 'div') {
      if(HasQuote($child, $xpath)) {
    $quote = ParseQuote($child, $xpath);
    if($quote['quote_text']) {
      $message_contents[] = $quote;
    }
      }
      else if(HasPHPCode($child, $xpath)) {
    $phpcode = ParseCode($child, $xpath);
    if($phpcode['code_text']) {
      $message_contents[] = $phpcode;
    }
      }
    }
    // Otherwise check if we've found some pretty text
    else if ($child->nodeType == XML_TEXT_NODE) {
      // This might be just whitespace, so check that it's not empty
      $text = trim($child->nodeValue);
      if($text) {
    $message_contents[] = array('type' => 'text', 'text' => trim($child->nodeValue));
      }
    }

  }

  return $message_contents;
}

$xpath = new DOMXpath($doc);
// We need to get the toplevel divs, which
// are the messages
$toplevel_divs = $xpath->query("//body/div");

$messages = array();
foreach($toplevel_divs as $toplevel_div) {
  $messages[] = GetMessages($toplevel_div, $xpath);
}

Now let's see what $messages looks like: 现在,让我们看看$messages是什么样的:

Array
(
    [0] => Array
        (
            [0] => Array
                (
                    [type] => text
                    [text] => Just the text.
                )

        )

    [1] => Array
        (
            [0] => Array
                (
                    [type] => quote
                    [nickname] => Nickname
                    [quote_text] => Hello. It's a quote
                )

            [1] => Array
                (
                    [type] => text
                    [text] => It's the simple text
                )

        )

    [2] => Array
        (
            [0] => Array
                (
                    [type] => text
                    [text] => Text
                )

            [1] => Array
                (
                    [type] => code
                    [code_text] => LALALA PHP CODE
                )

            [2] => Array
                (
                    [type] => text
                    [text] => More text
                )

        )

)

It's separated by message and then further separated into the different content in the message! 它被消息分开,然后进一步分成消息中的不同内容! Now we can even use a basic print function like this: 现在,我们甚至可以使用如下基本打印功能:

foreach($messages as $message) {
  echo "\n\n>>>>>> Message >>>>>>>\n";
  foreach($message as $content) {
    if($content['type'] == 'text') {
      echo "{$content['text']} ";
    }
    else if($content['type'] == 'quote') {
      echo "\n\n======== Quote =========\n";
      echo "From: {$content['nickname']}\n\n";
      echo "{$content['quote_text']}\n";
      echo "=====================\n\n";
    }
    else if($content['type'] == 'code') {
      echo "\n\n======== Code =========\n";
      echo "{$content['code_text']}\n";
      echo "=====================\n\n";
    }
  }
}

echo "\n";

And we get this! 我们得到了!

>>>>>> Message >>>>>>>
Just the text. 

>>>>>> Message >>>>>>>


======== Quote =========
From: Nickname

Hello. It's a quote
=====================

It's the simple text 

>>>>>> Message >>>>>>>
Text 

======== Code =========
LALALA PHP CODE
=====================

More text 

This all works, once again, because the DOM parsing functions are able to understand structure. 因为DOM解析功能能够理解结构,所以所有这些再次起作用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM