简体   繁体   中英

remove empty lines from HTML, except in <code></code> blocks

I'm using Editorial to write posts of my WordPress blog with markdown.

The markdown parser outputs HTML code perfectly, and the Editorial embedded viewer shows the result with the expected format and style.
But when I paste that HTML in the WordPress mobile editor, it shows the text in a wrong format, showing too many empty lines.

For example:

# Header
Hello world, **this is Markdown!**

Other markdown paragraph!. 

Is parsed to:

<h1>Header</h1>

<p>Hello world, <strong>this is Markdown!</strong></p>

<p>Other markdown paragraph!. </p>

Which is showed in the viewer as:

在此处输入图片说明

Which is what I expected.

The WordPress mobile app, on the other hand, shows that HTML code as:

在此处输入图片说明

As you can see, there are too many empty lines.

I think that the CSS sheet of WordPress has the margin of paragraphs and headers configured to put one empty line above, and one empty line below. But I cannot modify that CSS , so my brute-force solution was to remove the blank lines between paragraphs in the HTML code. This works fine, but the process is tedious.

So I want to use the powerful tools of Editorial to build a workflow to automatize the process.
The goal is to write a python script which takes the generated HTML and erases the empty lines, being careful to not erase the empty lines located at code blocks, which are source code examples.

I'm thinking about a solution using regular expressions to find the empty lines and discard the code blocks, but I'm pretty new to Python and its libraries, so the code snippets I have tried didn't work.

Anybody could provide me an example of how to achieve this, or a general guideline to write it myself?

PD: Post this kind of question without any example/sourcecode of what I have tried is a very bad idea, I know, but my python code is a noob-messy-bunch of code without any sense, so I decided to not post it.

Let's assume you have loaded the html as text (HTML):

HTML = """
html
html

html

code-start
code
code

code
code-end

"""

new_html = ""
is_code = False
for line in HTML.split('\n'):
    # disable empty line remover when code starts
    if line == 'code-start':
        is_code = True
    # check for empty line/is_code
    if is_code or line != '':
        new_html += line+'\n'
    # enable empty line remover when code ends
    if line == 'code-end':
        is_code = False

print new_html        

Of course you have to replace code-start and code-end with valid html tags.

This is just a quick and dirty approach but should help you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM