简体   繁体   中英

Join Broken Paragraphs HTMl Regex

I'm trying to edit some xhtml on Sigil .

With the command

< p>([az])

I'm able to find all paragraphs that begin with lower case . That tells me that they shouldn't be separate from the previous one. It's just a conversion issue.

What should I do to delete both the < p> from that paragraph and the < /p> from the previous one in order to join the two blocks of text into one single paragraph ?

It looks something like this:

< p> ... that is why relationships< /p>

< p> are not what they should be.

And it should be:

< p> that is why relationships are not what they should be.< /p>

I'm not too sure about Sigil, but the following regex should be able to do that:

First find:

</p>\s*<p>(\s*[a-z])

The replace it with:

$1

What this means:

\\s* : Any amount of whitespace

$1 : The group () youll keep after replacing

Or an easiest way by checking Dot Matches All :

<p>(.+?)</p>

And then you Replace only with: $1 or /1 ( Group )

It will remain only the block of text.

(.+?) - Everything until the first entity like slashes or > etc.

(.*?) - Everything including entities . ( Careful! )

Build your regex :

  • if you have newlines use \\n
  • if you have space use \\s
  • if you want to exclude something use ^
  • if you want to use both \\n and \\s go (\\n\\s)
  • if you want ANY of that use * after it. Ex: \\s* ( any white space until first entity )
  • if you want to search by first letter go ([az]) or all letters ([az]+)
  • by numbers ([0-9]) or more numbers ([0-9]+)
  • only 2 first letters ([az]{2}) etc.

    Advices :

  • Always USE preview or replace only the first match to see the difference.
  • Use them into groups with brackets ()

Hope this helps you understand better your issue.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM