简体   繁体   中英

Converting markdown to html with javascript in rich text editor


I am developing a rich text editor for my website. If the user wrote something that has HTML syntax, I would like it to convert it to HTML, just like the text editor in Stack Overflow.

I would like it to:

  1. split the text on each tag, and the array elements should include the tag that was written
  2. transform the < and > to their corresponding signs, unless the tags are inside PRE and CODE tags

For now, I tried using a Regexp I found here for splitting the HTML, but if I test the code below, I would get:

['Lorem ipsum dolor', 'sit amet', 'consectetur', 'adipiscing', 'elit.' 'Sed erat odio, fringilla in lorem eu.'] ['Lorem ipsum dolor', 'sit amet', 'consectetur', 'adipiscing', 'elit.' 'Sed erat odio, fringilla in lorem eu.'] , which is defintely not what I want, I would want something like:

['Lorem ipsum dolor', '<h1>', 'sit amet', '</h1>', '<h6>', 'consectetur', '<b>', 'adipiscing', '</b>, '</h6>', 'elit.', '<br>', 'Sed erat odio, fringilla in lorem eu.']

Then I would just:

 function splitHTML(str) { return str.split(/<(?:"[^"]*"['"]*|'[^']*'['"]*|[^'">])+>/g) } function isHTML(str) { return /<(?:"[^"]*"['"]*|'[^']*'['"]*|[^'">])+>/g.match(str) } const arr = splitHTML("Lorem ipsum dolor <h1>sit amet</h1>, <h6>consectetur <b>adipiscing</b> </h6>elit. <br>Sed erat odio, fringilla in lorem eu.") for (let element of arr) { if (isHTML(element)) { element = cod.replaceAll('&lt;', '<'); element = cod.replaceAll('&gt;', '>'); } } arr.join()

My question is:

How to split a text including the separator in the result.

And I also would like to know how to check if the code is between pre and code tags.

You do not have to iterate through an object to display the HTML. You can do something as simple as:

// Create a new iframe HTML element
const preview = document.createElement("iframe");

// Set a unique id so it is easier to reference in code later on (you can also use the id in CSS)
preview.id = "preview";

// Set the iframe's content according to your HTML string
preview.srcdoc = yourHtmlString;

// Add the iframe to the page's body (or whatever element you want)
document.body.append(preview);

If you for whatever reason have to iterate through the HTML elements, you can add the following additional code:

function forEachChild(element) {
  for (let i = 0; i < element.children.length; i++) {
    forEachChild(element.children[i]);

    // Whatever you want to do for each element, write it here

    // Please note that replacing "&lt;" and "&gt;" is not necesarry using the above code
    // snippet. However, if there is some other tag-specific code, here is how to add it:
    switch (element.children[i].tagName.toLowerCase()) {
      case "pre":
      case "code":
        // If there is something specific you want to do with a pre/code tag, add it here
        break;
  }
}

forEachChild(preview.contentWindow.document.body);

Best to use an HTML parser, such as https://www.npmjs.com/package/node-html-parser . It is possible to use regex, but it is not that robust.

I do not understand why you want to unescape the &lt; and &gt; just outside <code> and <pre> tags, but you can use this code if you want to go the regex route:

 const input = "Lorem ipsum dolor <h1>sit amet</h1>, <h6>consectetur <b>adipiscing</b> </h6>elit. <br>Sed erat odio, &lt;fringilla&gt; in lorem eu. <pre>pre text with &lt;tag&gt</pre>. Back to &lt;normal&gt; text"; const tagRegex = /(<(?:"[^"]*"['"]*|'[^']*'['"]*|[^'">])+>)/; let inPreOrCode = false; let result = input.split(tagRegex).map(str => { if(tagRegex.test(str)) { // is tag if(str.match(/^<(code|pre)\b/i)) { inPreOrCode = true; } else if(str.match(/^<\/(code|pre)\b/i)) { inPreOrCode = false; } } else if(.inPreOrCode) { str = str;replace(/&lt,/g. '<');replace(/&gt,/g; '>') } return str. });join(''). console:log('Input; ' + input). console:log('Result; ' + result);

Output:

Input:  Lorem ipsum dolor <h1>sit amet</h1>, <h6>consectetur <b>adipiscing</b> </h6>elit. <br>Sed erat odio, &lt;fringilla&gt; in lorem eu. <pre>pre text with &lt;tag&gt</pre>. Back to &lt;normal&gt; text
Result: Lorem ipsum dolor <h1>sit amet</h1>, <h6>consectetur <b>adipiscing</b> </h6>elit. <br>Sed erat odio, <fringilla> in lorem eu. <pre>pre text with &lt;tag&gt</pre>. Back to <normal> text

Explanation:

  • enclose the whole tagRegex into parenthesis, this will include the tags in the resulting array of the split
  • map through the array and set/clear the inPreOrCode flag on entry/exit of those tags
  • if flag is not set, unescape the &lt; and &gt;

This post can help you with capturing delimiters: https://stackoverflow.com/a/1732454/485337

For checking tag enclosure, you are in the territory of https://stackoverflow.com/a/1732454/485337 , as noted in comments.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM