Before asking this question I have Googled for this problem and I have looked through all StackOverflow related questions.
The problem is pretty simple
I have a string "North Atlantic Treaty Organization"
I have a pattern "a.*z", at moment it would match
north ATLATIC TREATY ORGANIZation
But I need it to match complete words only (orgANIZation for example)
I have tried "\\ba z\\b" and "\\Ba z\\B" as pattern, but I think I don't quite get it
How should I change my pattern in order to match complete words that string contains (without matching multiple words)
The patterns are generated on the fly, user enteres a*z and my application translates it into pattern that matches parts of complete words in string.
My problem is that I don't know what user is going to search for. Ideally I would preppend some regexp to user's expression.
Thank You!
ANIZ in orgANIZation is not a complete word -- it's a part of a word. Your pattern btw is not what you wrote -- a*z
would not match as you describe; you're probably using a.*z
instead, which would. So, try a[^ ]*z
so it won't match spaces. If there are other characters besides spaces that you don't want to match, eg some kinds of punctuation, stick them in the [^...]
construct as well, of course.
"a[^\s]*z"
This means an 'a' followed by any number of non-whitespace characters, followed by a 'z'.
EDIT: You seem to want ' *
' to be interpreted as a wildcard character. The user is thus not to enter a regex, but a string with certain wildcards. You can translate these wildcard characters to regex by reasoning over the intended meaning. Let's say that ' *
' should mean "zero or more characters that are not whitespace". You replace this character, then, with the corresponding regex:
[^\s]* `-.-´| Character class-----´ `---Zero or more of these '\s': "Whitespace" Inside Character class: if it starts with '^': "not"
You might also want to define '?' as matching exactly a single non-whitespace character. This is the same character class, but you omit the '*' at the end.
So, what you do is regex-replace " *
" with " [^\\s]*
" and " ?
" with " [^\\s]
".
that is what you are looking for:
new Regex( @"\b[^ ]*a[^ ]*z[^ ]*\b" );
it matches only a single word (no spaces are allowed) - but the whole one. You can translate your users input into such an regex - just replace * by [^ ]* - it works even with more than one wildcard.
Not related to your question directly, but you may want to check out a RegEx visualization tool which shows you the caputred results based on text input and a given regular expression.
Such a tool is very helpful to find the right pattern, which can be quite tricky. A nice tool specialized for .net RegEx is RegExLab , a bit older but does a good job in showing what exactly your regular expression matches. Since the page is in German, just click on the regexlab.006.zip link. Source code is also included.
Regex reWord = new Regex("\\b[A-Za-z]*?(a.*z)[A-Za-z]*\\b");
... this will return "Atlantic Treaty Organization", with the capture from a. *
z being "antic Treaty Organiz".
The problem is inherent in your method - unless you parse the user supplied "regex" of a *
z (or a. *
z, that's not quite clear from your post) by modifing *
to [^\\s] *
? as Svante suggests (or perhaps \\w *
?), you're going to gobble up far more characters than you like.
". *
" is, generally speaking, a bad idea when you're trying to be specific. It'll match everything but a newline, and there's nothing you can append to it that will stop that.
Regex reWord = new Regex("\\b\\w*?(a\\w*?z)\\w*\\b");
...will return just "Organization".
Alternatively, if you absolutely must , for whatever reason, avoid modifying the user supplied regex, perhaps try spliting your strings into an array of words and test each word individually against the regex.
Ultimately, it's GIGO - garbage in, garbage out. Feed your system a bad regex and if you don't fix it appropriately, you won't get what you're looking for.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.