简体   繁体   中英

Converting markdown to HTML with JavaScript - restricting sppported syntax

I am using marked.js currently to convert markdown to HTML, so the users of my Web-App can create a structured content. I am wondering if there is a way to restrict the supported syntax tu just an sub-set, like

headers

italic text

bold text

  • lists with only 1 depth of indentation

quotes

I would like to prohibit conversion of list with multiple levels of indentation, code blocks, headers in lists ...

The reason is, that my WebApp should the users to create content in a specific way and if there will be possibility create some crazy structured content (list of headers, code in headers, lists of images ...) someone will for sure do it.

You have a few difference options:

Marked.js uses a multi-step method to parse Markdown. It uses a lexer, which breaks the document up into tokens, a parser to convert those tokens to a abstract syntax tree (AST) and a renderer to convert the AST to HTML. You can override any of those pieces to alter the handling of various parts of the syntax.

For example, if you simply wanted to ignore lists and leave them out of the rendered HTML, replace the list function from the renderer with one which returns an empty string.

Or, if you want the parser to act as if lists are not even a supported feature of Markdown, you could remove the list and listitem methods from the parser. In that case, the list would remain in the output, but would be treated as a paragraph instead.

Or, if you want to support one level of lists, but not nested lists, then you could replace the list and/or listitem methods in the parser with your own implementation that parses lists as you desire.

Note that there are also a number advanced options , which use the above methods to alter the parser and/or render in various ways. For the most part, those options would not provide the features you are asking for, but browsing though the source code might give you some ideas of how to implement your own modifications.

However, there is the sanitize option, which will accept a sanitizer function. You could provide your own sanitizer which removed any unwanted elements from the HTML output. This would result in a similar end result to overriding the renderer, but would be implemented differently. Depending on what you want to accomplish, one or the other may be more effective.

Another possibility would be to use Commonmark.js , parse the input ant than walk the parsed tree and remove all nodes with/without specific type. See this example , it worked fine for images, but failed for code blocks.

Downside of this approach is, that the parsed markdown source will be traversed two-times: one time for editing and second time for rendering.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM