简体   繁体   中英

Java Regex - Match Any Part of Regular Expression

Consider a vague regex such as [az]{0,9}f[az]{0,2} . It will match Strings such as abcdefgh . I am adding extra elements onto this regex and I want it to be able to test several different parts of the regex.

For [az]{0,3}f[az]{2}e[az]{0,5} it would match face . But I would like it to also test Strings to match [az]{0,3}f or f[az]{2}e or [az]{0,3}f[az]{2}e or f[az]{2}e[az]{0,5} , but not messing up the order such as e[az]{0,5}f . To put it more simply: I have a master regex that I would like to try to test different Strings against. But not only do I want to test the whole regex, but I want to test each individual part of it too to see if the String can fit somewhere inside.

I can't simply manually input all of the different regex possibilities because they are generated from other methods that are executed and will be different each time, but will always follow the same general format above: a range of letters from 0 to some finite number, one or more letters, a specific number of letters that could fill the "gap," another range of letters except this time it is specific, etc.

I have spent literally hours pondering and trying different bits of code to try to split the regex and test each split, then merge some splits together to try to create each possibility. Eventually I succumbed to the challenge and decided to seek help. It is very difficult to describe what I am trying to accomplish, so I hope I did an okay enough job. Please bear with me.

I don't think there is any in-built way to do this. You will have to match the full regex part-by-part. Let's assume your regex will contain only the following:

  1. Single alphabet (eg, a , e ). Call this S.
  2. Alphabet ranges (eg, [az] , [pr] ). Call this A.
  3. Fixed number of occurrences (eg, {2} , {3} ). Call this F.
  4. Range of number of occurrences (eg, {2,4} , {0,3} ). Calls this R.

You can split the regex into the above tokens, and scan each sub-regex for a match. For example, [az]{0,9}f[az]{0,2} is of the form ARSAR . So, try matching with the whole regex first. If a match is found, dig deeper by trying a match with SAR . If it matches with SAR , you can go for AR in the next step. Now if it does not match, then it indicates that S is a required chunk. So try removing the last chunk ( R ) and try matching with SA . And so on.

It sounds like you are trying to develop some complex regex yeah?

My advice there is to get a text editor that will highlight matches in real time, and have a perl-compatible regex engine.

I myself use Sublime Text 3 , with regex find turned on ( ctrl + f , then alt + r ).

I'll enter all the cases I want to match into the text area. For example Wikipedia's Valid email address examples

My regex handbook is the oniguruma RE doc

Edit : The linked RE.txt doc appears to be dead so I mirrored it on a github gist here: RE.txt (https://gist.github.com/thorsummoner/63811b64a4a9b7860187)

Alternatively

It might be that what your trying to do isn't suited to regex. It sounds like you are trying to do partial word matching, or a best match selections.

Perhaps you should consider more fine grain logic and substring, character indexing checking.

Disclaimer: I don't feel I have a grasp on your question or use cases

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM