简体   繁体   English

清理 HTML 输入

[英]Sanitizing HTML input

I'm thinking of adding a rich text editor to allow a non-programmer to change the aspect of text.我正在考虑添加一个富文本编辑器,以允许非程序员更改文本的方面。 However, one issue is that it's possible to distort the layout of a rendered page if the markup is incorrect.但是,一个问题是,如果标记不正确,则可能会扭曲呈现页面的布局。 What's a good lightweight way to sanitize html?什么是清理 html 的轻量级好方法?

You will have to decide between good and lightweight.你将不得不在好和轻量级之间做出决定。 The recommended choice is 'HTMLPurifier', because it provide no-fuss secure defaults.推荐的选择是“HTMLPurifier”,因为它提供了无忧无虑的安全默认值。 As faster alternative it is often advised to use ' htmLawed '.作为更快的替代方案,通常建议使用“ htmLawed ”。

See also this quite objective overview from the HTMLPurifier author: http://htmlpurifier.org/comparison另请参阅 HTMLPurifier 作者的这个非常客观的概述: http : //htmlpurifier.org/comparison

我真的很喜欢HTML Purifier ,它允许您指定 HTML 代码中允许使用哪些标签和属性——并生成有效的 HTML。

Use BB codes (or like here on SO), otherwise chances are very slim.使用 BB 代码(或像这里的 SO),否则机会非常渺茫。 Example function...示例函数...

function parse($string){

    $pattern = array(
    "/\[url\](.*?)\[\/url\]/",
    "/\[img\](.*?)\[\/img\]/",
    "/\[img\=(.*?)\](.*?)\[\/img\]/",
    "/\[url\=(.*?)\](.*?)\[\/url\]/",
    "/\[red\](.*?)\[\/red\]/",
    "/\[b\](.*?)\[\/b\]/",
    "/\[h(.*?)\](.*?)\[\/h(.*?)\]/",
    "/\[p\](.*?)\[\/p\]/",    
    "/\[php\](.*?)\[\/php\]/is"
    );

    $replacement = array(
    '<a href="\\1">\\1</a>',
    '<img alt="" src="\\1"/>',
    '<img alt="" class="\\1" src="\\2"/>',
    '<a rel="nofollow" target="_blank" href="\\1">\\2</a>',
    '<span style="color:#ff0000;">\\1</span>',
    '<span style="font-weight:bold;">\\1</span>',
    '<h\\1>\\2</h\\3>',
    '<p>\\1</p>',
    '<pre><code class="php">\\1</code></pre>'
    );

    $string = preg_replace($pattern, $replacement, $string);

    $string = nl2br($string);

    return $string;

}

... ...

echo parse("[h2]Lorem Ipsum[/h2][p]Dolor sit amet[/p]");

Result...结果...

<h2>Lorem Ipsum</h2><p>Dolor sit amet</p>

在此处输入图片说明

Or just use HTML Purifier :)或者只使用 HTML Purifier :)

Both HTML Purifier and htmLawed are good. HTML Purifier 和 htmLawed 都不错。 htmLawed has the advantage of a much smaller footprint and high configurability. htmLawed的优点是占用空间小得多且可配置性高。 Besides doing the standard work of balancing tags, filtering specific HTML tags or their attributes or attribute content (through white or black lists), etc., it also allows the use of custom functions.除了做平衡标签、过滤特定HTML标签或其属性或属性内容(通过白名单或黑名单)等标准工作外,它还允许使用自定义功能。

Using the HTML Sanitizer API it's easy to do:使用HTML Sanitizer API很容易做到:

// our input string to clean
const stringToClean = 'Some text <b><i>with</i></b> <blink>tags</blink>, including a rogue script <script>alert(1)</script> def.';

const result = new Sanitizer().sanitizeToString(stringToClean);
console.log(result);
// Logs: "Some text <b><i>with</i></b> <blink>tags</blink>, including a rogue script def."

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM