简体   繁体   中英

JavaScript regex multiline text between two tags

I wrote a regex to fetch string from HTML, but it seems the multiline flag doesn't work.

This is my pattern and I want to get the text in h1 tag.

var pattern= /<div class="box-content-5">.*<h1>([^<]+?)<\/h1>/mi
m = html.search(pattern);
return m[1];

I created a string to test it. When the string contains "\\n", the result is always null. If I removed all the "\\n"s, it gave me the right result, no matter with or without the /m flag.

What's wrong with my regex?

You are looking for the /.../s modifier, also known as the dotall modifier. It forces the dot . to also match newlines, which it does not do by default.

The bad news is that it does not exist in JavaScript (it does as of ES2018, see below) . The good news is that you can work around it by using a character class (eg \\s ) and its negation ( \\S ) together, like this:

[\s\S]

So in your case the regex would become:

/<div class="box-content-5">[\s\S]*<h1>([^<]+?)<\/h1>/i

As of ES2018, JavaScript supports the s (dotAll) flag, so in a modern environment your regular expression could be as you wrote it, but with an s flag at the end (rather than m ; m changes how ^ and $ work, not . ):

/<div class="box-content-5">.*<h1>([^<]+?)<\/h1>/is

You want the s (dotall) modifier, which apparently doesn't exist in Javascript - you can replace . with [\\s\\S] as suggested by @molf. The m (multiline) modifier makes ^ and $ match lines rather than the whole string.

[\\s\\S] did not work for me in nodejs 6.11.3. Based on the RegExp documentation , it says to use [^] which does work for me.

(The dot, the decimal point) matches any single character except line terminators: \\n, \\r, \
 or \
.

Inside a character set, the dot loses its special meaning and matches a literal dot.

Note that the m multiline flag doesn't change the dot behavior. So to match a pattern across multiple lines, the character set [^] can be used (if you don't mean an old version of IE, of course), it will match any character including newlines.

For example:

/This is on line 1[^]*?This is on line 3/m

where the *? is the non-greedy grab of 0 or more occurrences of [^].

The dotall modifier has actually made it into JavaScript in June 2018, that is ECMAScript 2018.
https://github.com/tc39/proposal-regexp-dotall-flag

const re = /foo.bar/s; // Or, `const re = new RegExp('foo.bar', 's');`.
re.test('foo\nbar');
// → true
re.dotAll
// → true
re.flags
// → 's'

My suggestion is that it's better to split the multiple-line string with "\\n" and concatenate the splits of the original string and becomes a single line and easy to manipulate.

<textarea class="form-control" name="Body" rows="12" data-rule="required" 
                  title='@("Your feedback ".Label())'
                  placeholder='@("Your Feedback here!".Label())' data-val-required='@("Feedback is required".Label())'
                  pattern="^[0-9a-zA-Z ,;/?.\s_-]{3,600}$" data-val="true" required></textarea>


$( document ).ready( function() {
  var errorMessage = "Please match the requested format.";
  var firstVisit = false;

  $( this ).find( "textarea" ).on( "input change propertychange", function() {

    var pattern = $(this).attr( "pattern" );
    var element = $( this );

    if(typeof pattern !== typeof undefined && pattern !== false)
    {
      var ptr = pattern.replace(/^\^|\$$/g, '');
      var patternRegex = new RegExp('^' + pattern.replace(/^\^|\$$/g, '') + '$', 'gm');     

      var ks = "";
      $.each($( this ).val().split("\n"), function( index, value ){
        console.log(index + "-" + value);
        ks += " " + value;
      });      
      //console.log(ks);

      hasError = !ks.match( patternRegex );
      //debugger;

      if ( typeof this.setCustomValidity === "function") 
      {
        this.setCustomValidity( hasError ? errorMessage : "" );
      } 
      else 
      {
        $( this ).toggleClass( "invalid", !!hasError );
        $( this ).toggleClass( "valid", !hasError );

        if ( hasError ) 
        {
          $( this ).attr( "title", errorMessage );
        } 
        else
        {
          $( this ).removeAttr( "title" );
        }
      }
    }

  });
});

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM