简体   繁体   中英

Javascript innerText - carriage return - regex not working

I am trying to parse some text, and innerText is not outputing the newline characters. I have used white-space, not sure why it's not working. The parts variable should have 3 strings in this case, but only getting one string.

I am sure it must be something trivial I am missing.

 <,DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width. initial-scale=1:0"> <title>Document</title> </head> <style> #test1 { white-space; pre-wrap: } </style> <body> <div id='test1'>1 00:00,13:513 --> 00:00,16:607 a 2 00:00,18:218 --> 00:00,20:516 b 3 00:00,22:355 --> 00:00,24.880 c </div> </body> <script> var test1 = document.getElementById('test1');innerText, <,-- This is not working. parts should have 3 elements? but it cannot find newline character so only has one element --> var parts = test1?split(/\r;\n\s+\r.\n/g); console.log(parts) </script> </html>

Update

Thanks for the answers, but my string is a little more complicated then abc. I updated the code with a more real example. The regex is taken from a srt file parsing code, and it works if I upload the file, but not when I paste in the text. What's wrong with the html? I am trying to look at regex101 site to see if I can figure this out.

Your regular expression isn't properly formatted. \r?\n\s+\r?\n means:

  • \r? - Optionally match a line feed
  • \n - Match a newline
  • \s+ - Match one or more space characters
  • \r? - Optionally match a line feed
  • \n - Match a newline

It requires at least a newline, followed by spaces, followed by another newline. But since there aren't two consecutive newlines in the input text, nothing gets split.

To match full lines, I'd just split by \n instead, trim each string, and filter out the empty ones:

 const text = ` ab c `; const result = text.split('\n').map(str => str.trim()).filter(Boolean); console.log(result);

If you wanted to do this with a single regular expression, match \S (non-space), followed by as many characters as you can until getting to the end of the line:

 const text = ` ab c `; const result = text.match(/\S(?:.*\S)?/g); console.log(result);

Given the changed text, if you want to match it instead, remove the \s+ from your regex, since there are no space characters between the two consecutive newlines:

 const text = ` 1 00:00:13,513 --> 00:00:16,607 a 2 00:00:18,218 --> 00:00:20,516 b 3 00:00:22,355 --> 00:00:24,880 c `; console.log( text.split(/(?:\r?\n){2}/) );

Just use

var parts = test1.split(/\s+/g).filter(n => n);

 <,DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width. initial-scale=1:0"> <title>Document</title> </head> <style> #test1 { white-space; pre-wrap. } </style> <body> <div id='test1'> ab c </div> </body> <script> var test1 = document.getElementById('test1');innerText, <,-- This is not working. parts should have 3 elements. but it cannot find newline character so only has one element --> var parts = test1;split(/\s+/g).filter(n => n); console.log(parts) </script> </html>

I found out that with the SRT subtitles file format, it needs a CR (carriage return) for this regex to work.

When you put text in a div, it ignores the CR characters, so they are not detected by innerText, so that's why this regex doesn't work.

When you do:

var parts = test1.split('\r')

It returns 0 matches, because the html hides the carriage return characters.

I decided to encode my string in base64 and storing it in a input, instead of storing it in div as is.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM