简体   繁体   中英

bleach clean adds "<pre><code>“ tag at the beginning rather than cleaning

I scraped some html contents from internet, below is only a beginning part of it,

<p style="max-width: 100%;min-height: 1em;letter-spacing: 0.544px;text-align: center;box-sizing: border-box !important;word-wrap: break-word !important;"><strong style="max-width: 100%;letter-spacing: 0.544px;font-size: 24px;box-sizing: border-box !important;word-wrap: break-word !important;"><strong style="max-width: 100%;letter-spacing: 0.544px;box-sizing: border-box !important;word-wrap: break-word !important;"><span style="max-width: 100%;color: rgb(255, 41, 65);box-sizing: border-box !important;word-wrap: break-word !important;"><strong style="max-width: 100%;letter-spacing: 0.544px;color: rgb(0, 0, 0);font-size: 18px;box-sizing: border-box !important;word-wrap: break-word !important;"><span style="max-width: 100%;font-size: 24px;letter-spacing: 0.544px;box-sizing: border-box !important;word-wrap: break-word !important;"><strong style="max-width: 100%;letter-spacing: 0.544px;box-sizing: border-box !important;word-wrap: break-word !important;"><span style="max-width: 100%;letter-spacing: 0.544px;box-sizing: border-box !important;word-wrap: break-word !important;"><strong style="max-width: 100%;box-sizing: border-box !important;word-wrap: break-word !important;"><strong style="max-width: 100%;letter-spacing: 0.544px;box-sizing: border-box !important;word-wrap: break-word !important;"><span style="max-width: 100%;letter-spacing: 0.544px;color: rgb(61, 167, 66);box-sizing: border-box !important;word-wrap: break-word !important;"><strong style="max-width: 100%;box-sizing: border-box !important;word-wrap: break-word !important;">...

I am using

body_html=bleach.clean(markdown(value, output_format='html'),tags=['SOME_ALLOWED_TAGS'] ,attributes=['SOME_ALLOWED_ATTRIBUTES'],styles=['SOME_ALLOWED_STYLES'],strip=True,strip_comments=True)

but the return is not what I expected,

<pre><code> &lt;p style="max-width: 100%;min-height: 1em;letter-spacing: 0.544px;text-align: center;box-sizing: border-box !important;word-wrap: break-word !important;"&gt;&lt;strong style="max-width: 100%;letter-spacing: 0.544px;font-size: 24px;box-sizing: border-box !important;word-wrap: break-word !important;"&gt;&lt;strong style="max-width: 100%;letter-spacing: 0.544px;box-sizing: border-box !important;word-wrap: break-word !important;"&gt;&lt;span style="max-width: 100%;color: rgb(255, 41, 65);box-sizing: border-box !important;word-wrap: break-word !important;"&gt;&lt;strong style="max-width: 100%;letter-spacing: 0.544px;color: rgb(0, 0, 0);font-size: 18px;box-sizing: border-box !important;word-wrap: break-word !important;"&gt;&lt;span style="max-width: 100%;font-size: 24px;letter-spacing: 0.544px;box-sizing: border-box !important;word-wrap: break-word !important;"&gt;&lt;strong style="max-width: 100%;letter-spacing: 0.544px;box-sizing: border-box !important;word-wrap: break-word !important;"&gt;&lt;span style="max-width: 100%;letter-spacing: 0.544px;box-sizing: border-box  

what is wrong with bleach clean? is it because I have too many tags and styles to be cleaned so it just added " <pre><code> " at the beginning and closed it at the end?

Figured Out. It is because the content to be cleaned contains \\n \\n\\n \\n\\n \\n \\n at the beginning. Should remove those first.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM