简体   繁体   中英

Regex pattern needed to enclose all plain text parts with <p> and </p> tags

My requirement highly exceeds my very basic regex knowledge. I even couldn't try anything. Can you please help me for the $pattern?

What I require is: Plain Text parts (part 1s below) must be enclosed with <p> & </p> tags

The current state & properties of my string are:

  • My string is a user input.
  • Each newline (\\n) is converted to <br> tags. 2 or more <br> tags are replaced with <br><br> .

String can have 3 parts. Parts below can be in any order and any amount of number.

    part 1         part 2        part 3
|------------| |-------------| |--------|
| plain text | | <ul>..</ul> | | $$..$$ |
|------------| |-------------| |--------|
  • No <br> tags at the start and at the end of the string. (trimmed with <br> mask.)
  • Part 2 ( <ul> .. </ul> part) DOES NOT have <br> tag before and after of itself; never.
  • Part 3 ( $$ .. $$ part) DOES NOT have <br> tag before and after itself; never.
  • Exactly 2 pieces of <br> tags between plain text parts. (So <br><br> can ONLY exist between plain text parts.)
  • A plain text part may or may not have these inline stylings: <b></b> , <i></i> and $..$ .

So I think what I need logically is match the string that no $matches[0] can match with part 2 or part3 . In other words, no $matches[0] can have part 2 or part 3 in it.

Thanks in advance, best regards.

irrelevant note: $$ and $ usage exist because of mathjax.

The simple answer is:

preg_match('/<p>(.*?)<\/p>/', $oldString, $m);
$newString=$m[1];

to understand the meaning of $m[] refer to the manual: http://php.net/manual/en/function.preg-match.php

The point is that i do not think you'r going to solve your problem that should be (i guess) to "purify" a text provided by an user input and/or from a formatted source. First of all couse the lack of meaning of the expression "plain text" (what does exctly means?) and, generalizing, it's virtually impossible to forecast all possible cases of unwanted code inside the input... lots of them are potentially even very dangerous.

When i face the problem of "purifing" a formatted text (provided or not by an user input) a good starting point is this very well done, and higly personalizable library: http://htmlpurifier.org/

I did not understand whether you want the <p></p> tags to be grabbed, in this case:

$newString=$m[0];

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM