简体   繁体   English

使用正则表达式解析Markdown的子集

[英]Parsing subset of markdown with regex

To display comments on my blog, I want to parse just a subset of markdown. 为了在我的博客上显示评论,我只想分析markdown的一部分。 Namely, links, inline code, code blocks and paragraphs. 即,链接,内联代码,代码块和段落。

I'm having a hard time with fenced code blocks because the regex collide with both the one for inline code and paragraphs. 我在使用受限制的代码块时遇到了麻烦,因为正则表达式与内联代码和段落都冲突。

Here is my function: 这是我的功能:

function parseMd(text) {
    const codeblock = /```([^]+?.*?[^]+?[^]+?)```/g
    const code = /`(.*?)`/g
    const link = /\[(.*?)\]\((.*?)\)/g
    const paragraph = /(.+((\r?\n.+)*))/g

    return text.replace(codeblock, '<pre><code>$1</code></pre>')
    .replace(code, '<code>$1</code>')
    .replace(link, '<a href="$2">$1</a>')
    .replace(paragraph, '<p>$1</p>');
}

Ideally, I'd need code and paragraph regexes to ignore eveyrhing that matches codeblock , but as it's a multi-line one, it's getting tricky! 理想情况下,我需要codeparagraph表达式来忽略与codeblock匹配的eveyrhing,但是由于它是多行代码,因此变得棘手!

您也许可以利用markedjs项目中的则表达式。

The implementation provided below relies on innerHTML attribute inspection and editing . 下面提供的实现依赖于innerHTML属性检查编辑

It follows these major steps: 它遵循以下主要步骤:

  1. Parse code blocks first to protect them from further substitutions; 首先解析代码块,防止它们被替换;
  2. Inject the result in an HTML element as an innerHTML attribute to let the browser parse and turn it into a Node set (now code blocks are protected); 将结果注入 HTML元素作为innerHTML属性,以使浏览器进行 解析并将其转换为Node (现在代码块已受保护);
  3. Replace the remaining text (which are now text nodes) by <P> elements to get your paragraphs parsed, and perform inline elements parsing inside to finally get links and inline code . <P>元素替换其余文本 (现在是文本节点)以解析您的段落 ,并在内部执行内联元素解析以最终获得链接inline code

Below a tested snippet. 在经过测试的代码段下方。

 // Some HTML elements used througout the parsing var resultDiv = document.getElementById("rendered-result") var resultSrcTA = document.getElementById("resultsrc"); // convert() - Our lite MD parser function convert() { var mdt = document.getElementById("md").value; // First we parse the blocks to prevent them to be parsed later on parseCodeBlocks(mdt); // Then we deal with the remaning text, which are paragraphs parseParagraphs(); resultSrcTA.value = resultDiv.innerHTML; } // This function simply performs a regexp substitution on a given text // and inject it into the result HTML element (resultDiv) // as an inner HTML string to let the browser parting it function parseCodeBlocks(text) { const codeblock = /```\\s*([^]+?.*?[^]+?[^]+?)```/g; resultDiv.innerHTML = text.replace(codeblock, '<pre><code>$1</code></pre>'); } // This function replaces remaining text nodes with paragraphs // (The tricky part) function parseParagraphs() { var nodes = resultDiv.childNodes; // Looping through the nodes for (var i = 0; i < nodes.length; i++) { // If the current node isn't a text node, next! if (nodes[i].nodeType != 3) continue; // Converting the current text node as an array of <P> elements ps = createPElementFromMDParagraphs(nodes[i].nodeValue); // Reverse looping through the <P> elements // Since we insert them right after the parsed text node for (var j = ps.length -1 ; j > -1 ; j--) { resultDiv.insertBefore(ps[j], nodes[i].nextSibling) } // We've done with paragraph insertion, time to remove // the parsed text node resultDiv.removeChild(nodes[i]); // Updating i : we added n paragraph and removed one text node i += ps.length - 1; } } // This function return for a given text a <P> array representing // the content function createPElementFromMDParagraphs(text) { const paragraph = /(.+)((\\r?\\n.+)*)/g; const code = /`(.*?)`/g; const link = /\\[(.*?)\\]\\((.*?)\\)/g; var ps = []; var matches; // We loop through paragraph regex matches // For each match, we create a <P> element and we push it // into the result array while ((matches = paragraph.exec(text)) !== null) { var p = document.createElement("p"); p.appendChild(document.createTextNode(matches[1])); // And we have here an opportunity to format the inline elements // Note that links will be parsed inside a code element and will work p.innerHTML = p.innerHTML.replace(code, '<code>$1</code>'); p.innerHTML = p.innerHTML.replace(link, '<a href="$2">$1</a>'); ps.push(p); } return ps; } 
 /* Just to get it fancy */ textarea { width: 100%; height: 15ex; } div#rendered-result { min-height:10ex; height:10ex; border: 1px solid black; padding:1em; font-family:sans-serif; overflow-y:auto; } div#rendered-result > pre { background: #f0f0f0; margin: 1em; padding:0.5em; border: 1px solid #808080; } div#rendered-result > pre > code { margin: 0; } /* To check if this is a well parsed paragraph. */ div#rendered-result > p::first-letter { font-weight:bold; color:darkred; } 
 <p>Type your markdown text below:</p> <textarea id="md"> The first paragraph. ``` A code block very well catched Haha this `inline code` wont be parsed. ``` And a sentence with [two](http://stackoverflow.com) [links](http://askbuntu.com). And another with an `inline code` and a [link](http://superuser.com). ``` And another code block ``` And another with an link as `[inline code](http://superuser.com)`. Last sentence. </textarea> <p> <!-- yes it's bad --> <button onclick="convert()">Convert it</button> </p> <p>Result</p> <div id="rendered-result"> </div> <p>Source:</p> <textarea id="resultsrc"> </textarea> 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM