简体繁体中英

extract text from html without loosing new lines

原文 2020-09-11 09:35:42 1 2 javascript/ html/ regex/ string

I use the following code to extract text from html.

var html = "first line.&nbsp;<div>second line.&nbsp;</div><div>third line.</div><div><br></div>"
var text = extractContent(html);
console.log("TEXT: " + text);

function extractContent(s) {
   var span = document.createElement('span');
   span.innerHTML = s;
   return span.textContent || span.innerText;
}

the result of this code is the text without new lines. but I want the result to replace divs with "\\n" like this:

first line."\n"second line. "\n" third line."\n"

2 answers

Use s.replaceAll('&nbsp', '\\\\n'); to replace the linefeed.
Note: The backslash \\ needs to be escaped with another \\ .

 var html = "first line.&nbsp;<div>second line.&nbsp;</div><div>third line.</div><div><br></div>" var text = extractContent(html); console.log("TEXT: " + text); function extractContent(s) { var span = document.createElement('span'); s = s.replaceAll('&nbsp', '\\\\n'); span.innerHTML = s; return span.textContent || span.innerText; }

document.createElement is not necessary, you can use regexp alone to achieve this:

 var html = "first line.&nbsp;<div>second line.&nbsp;</div><div>third line.</div><div><br></div>" var text = extractContent(html); console.log("TEXT: " + text); function extractContent(s) { /* * /(<[^>]+>)+/gim <---- this regexp to match all html tags and replace them with \\n, * also merge empty content html tags into one \\n * */ return s.replace(/(<[^>]+>)+/gim, "\\n"); }

how to replace element contents with new html without loosing javascript function?

Get an Array of String from a html Text area without empty lines

How to extract text without HTML Entities encoding from React ContentEditable?

HTML td text break /new lines

Extract text from HTML with Javascript

How to extract HTML tags without/excluding text

Regex: how to extract text until end of string with multiple new lines?

Getting new lines from database to html textarea

JS: Extract text from a string without jQuery

extract specific data from a HTML table or hide specific lines

暂无

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question how to replace element contents with new html without loosing javascript function? Get an Array of String from a html Text area without empty lines How to extract text without HTML Entities encoding from React ContentEditable? HTML td text break /new lines Extract text from HTML with Javascript How to extract HTML tags without/excluding text Regex: how to extract text until end of string with multiple new lines? Getting new lines from database to html textarea JS: Extract text from a string without jQuery extract specific data from a HTML table or hide specific lines

Related Tags

粤ICP备18138465号 © 2020-2024 STACKOOM.COM