简体   繁体   中英

How can I split this regex up to make more readable?

and still keep it in the object literal:

url:       /:\/{0,3}(www\.)?([0-9.\-A-Za-z]{1,253})([\x00-\x7F]{1,2000})$/,

In addition how can I simplify it.

It is just a mess in the current state. I'm not worried about accuracy right now.

Here is my try from Crockford's book:

makeRegex: function () {
    var parse_url = /^(?:([A-Za-z]+):)?(\/{0,3})
                    ([0-9.\-A-Za-z]+)
                    (?::(\d+))
                    ?(?:\/([^?#]*))
                    ?(?:\?([^#]*))
                    ?(?:#(.*))?$/; 
},

Regular expressions are notoriously unreadable. They don't like extra spaces and they don't have comments. Your only possible solution is to construct a string and then turn that into a regular expression.

Here are the steps I went trough

Target Regular Expression

var regex=/:\/{0,3}(www\.)?([0-9.\-A-Za-z]{1,253})([\x00-\x7F]{1,2000})$/;

Use RegExp to construct the expression from a string.

var parse_url = RegExp(':/{0,3}(www\\.)?([0-9.\\-A-Za-z]{1,253})([\\x00-\\x7F]{1,2000})$');

Remember:

  • the / delimiters at the beginning and the end of the expression are not there — they're only in a RegEx literal
  • the \\ characters in the string are doubled, because the string has its own interpretation of them

Break the string up by adding '+' as strategic points:

var parse_url = RegExp(':/{0,3}(www\\.)?'+'([0-9.\\-A-Za-z]{1,253})'+'([\\x00-\\x7F]{1,2000})$');

var parse_url = RegExp(':/{0,3}(www\\.)?'+
    '([0-9.\\-A-Za-z]{1,253})'+
    '([\\x00-\\x7F]{1,2000})$');

It's not a very good solution, but that's all you can do with a regular expression.

Modern JavaScript does support multi-line strings in the form of the template literals, but that probably won't help much here.

I suggest breaking a regular expression into parts and assigning each part to a well-named variable, with a comment if necessary. An example, which is meant to demonstrate the principle rather than correctly validate URLs, since a URL-matching regex is hard to write ( https://mathiasbynens.be/demo/url-regex ):

var protocol = '(?:https?|ftp)'; // Protocol can be "http", "https" or "ftp"
var domain = '([A-Za-z0-9\.]+)'; // Alphanumeric characters separated by periods
var path = '(?:[A-Za-z0-9\.\/]+)'; // Alphanumeric characters, . or /
var regexp = Regexp(protocol + '://' + domain + '/' + path);

Now you have the regular expression broken into smaller, more easily understood mini-expressions, and the overall expression is a lot easier to read.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM