简体   繁体   中英

RegEx - character not before match

I understand the concepts of RegEx, but this is more or less the first time I've actually been trying to write some myself.

As a part of a project, I'm attempting to parse out strings which match to a certain domain (actually an array of domains, but let's keep it simple).

At first I started out with this:

url.match('www.example.com')

But I noticed I was also getting input like this:

http://www.someothersite.com/page?ref=http://www.example.com

These rows will of course match for www.example.com but I wish to exclude them. So I was thinking along these lines: Only match rows that contain www.example.com , but not after a ? character. This is what I came up with:

var reg = new RegExp("[^\\?]*" + url + "(\\.*)", "gi"); 

This does however not seem to work, any suggestions would be greatly appreciated as I fear I've used what little knowledge I yet possess in the matter.

Edit: Some clarifications.

  • The input is logged GET requests. From these I wish to filter out only a few domains. These will have/should handle 0-1 arbitrary subdomains ( example.com , www.example.org , www.somethirdsite.com and web.example.net should all be valid), these will be stored in a variable.
  • I specifically found a request as mentioned above, but I would like to also be able to handle http://www.someothersite.com/page?ref=https://www.example.com and http://www.someothersite.com/page?ref=www.example.com ie, if my needle is not part of the request domain, but part of the request data, I do not want the match.

Edit: here is the modified regex for arbitrary domain:

RegExp("(^|\\s)(https?://)?(\\w+\\.)?" + url, "gi");

The idea here is that you're matching only url preceded by some white spaces character, which makes it impossible to be inside the query.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM