简体   繁体   中英

Regex to match multiple strings

I need to create a regex that can match multiple strings. For example, I want to find all the instances of "good" or "great". I found some examples, but what I came up with doesn't seem to work:

\b(good|great)\w*\b

Can anyone point me in the right direction?

Edit: I should note that I don't want to just match whole words. For example, I may want to match "ood" or "reat" as well (parts of the words).

Edit 2: Here is some sample text: "This is a really great story." I might want to match "this" or "really", or I might want to match "eall" or "reat".

If you can guarantee that there are no reserved regex characters in your word list (or if you escape them), you could just use this code to make a big word list into @"(a|big|word|list)" . There's nothing wrong with the | operator as you're using it, as long as those () surround it. It sounds like the \\w* and the \\b patterns are what are interfering with your matches.

String[] pattern_list = whatever;
String regex = String.Format("({0})", String.Join("|", pattern_list));
(good)*(great)*

编辑后:

\b(g*o*o*d*)*(g*r*e*a*t*)*\b

Just check for the boolean that Regex.IsMatch() returns.

if (Regex.IsMatch(line, "condition") && Regex.IsMatch(line, "conditition2"))

The line will have both regex, right.

I'm not entirely sure that regex alone offers a solution for what you're trying to do. You could, however, use the following code to create a regex expression for a given word. Although, the resulting regex pattern has the potential to become very long and slow :

function wordPermutations( $word, $minLength = 2 )
{
    $perms = array( );

    for ($start = 0; $start < strlen( $word ); $start++)
    {
        for ($end = strlen( $word ); $end > $start; $end--)
        {
            $perm = substr( $word, $start, ($end - $start));

            if (strlen( $perm ) >= $minLength)
            {
                $perms[] = $perm;
            }
        }
    }

    return $perms;
}

Test Code:

$perms = wordPermutations( 'great', 3 );  // get all permutations of "great" that are 3 or more chars in length
var_dump( $perms );

echo ( '/\b('.implode( '|', $perms ).')\b/' );

Example Output:

array
  0 => string 'great' (length=5)
  1 => string 'grea' (length=4)
  2 => string 'gre' (length=3)
  3 => string 'reat' (length=4)
  4 => string 'rea' (length=3)
  5 => string 'eat' (length=3)

/\b(great|grea|gre|reat|rea|eat)\b/

I think you are asking for smth you dont really mean if you want to search for any Part of the word, you litterally searching letters

eg Search {Jack, Jim} in "John and Shelly are cool"

is searching all letters in the names {J,a,c,k,i,m}

*J*ohn * a *nd Shelly *a*re

and for that you don't need REG-EX :)

in my opinion, A Suffix Tree can help you with that

http://en.wikipedia.org/wiki/Suffix_tree#Functionality

enjoy.

I don't understand the problem correctly:

If you want to match "great" or "reat" you can express this by a pattern like:

"g?reat"

This simply says that the "reat"-part must exist and the "g" is optional.

This would match "reat" and "great" but not "eat", because the first "r" in "reat" is required.

If you have the too words "great" and "good" and you want to match them both with an optional "g" you can write this like this:

(g?reat|g?ood)

And if you want to include a word-boundary like:

\b(g?reat|g?ood)

You should be aware that this would not match anything like "breat" because you have the "reat" but the "r" is not at the word boundary because of the "b".

So if you want to match whole words that contain a substring link "reat" or "ood" then you should try:

"\b\w*?(reat|ood)\w+\b"

This reads: 1. Beginning with a word boundary begin matching any number word-characters, but don't be gready. 2. Match "reat" or "ood" enshures that only those words are matched that contain one of them. 3. Match any number of word characters following "reat" or "ood" until the next word boundary is reached.

This will match:

"goodness", "good", "ood" (if a complete word)

It can be read as: Give me all complete words that contain "ood" or "reat".

Is that what you are looking for?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM