简体   繁体   中英

How to match this strings with Regex?

Basically I have music filenames such as:

<source> <target>

"Travis - Sing"   "Travis - Sing 2001.mp3"
"Travis - Sing"   "Travis - Sing Edit.mp3"
"Travis - Sing"   "Travis - Sing New Edit.mp3"
"Mission Impossible I"   "Mission Impossible I - Main Theme.mp3"
"Mission Impossible I"   "Mission Impossible II - Main Theme.mp3"
"Mesrine - Death Instinct"   "Mesrine - Death Instinct - Le Million.mp3"
"Mesrine - Public Enemy #1"   "Mesrine - Public Enemy #1 - Theme"
"Se7en"   "Se7en Motion Picture Soundtrack - Theme.mp3"

Parenthesis aren't included in the strings (just for demonstration).

and I am trying to match the "source" to "target" values.

So the source names I already have, but right now I am using alot of string parsing to be able to match the two. How can I achieve the same using Regex?

EDIT: It seems like there is a confusion.

"Travis - Sing" is my source string, and I am trying to match it to:

"Travis - Sing (2001).mp3"
"Travis - Sing (Edit).mp3"
"Travis - Sing (New Edit).mp3"

EDIT2: Removed the parenthesis.

It seems you're looking for all files that begin with a certain string - this will answer all of your examples. This can be achieved easily without regular expressions using two loops, or using linq:

var matches = from source in sources
              select new
                      {
                          Source = source,
                          Targets = from file in targets
                                    where file.StartsWith(source)
                                    select file
                      };

You can also use a regex instead of the StartsWith condition, for example:

where Regex.IsMatch(file, String.Format("^{0}", source), RegexOptions.IgnoreCase)

This can probably be optimized in many ways, but Andrew suggests writing a long pattern, which isn't quicker when done dynamically.

From your answer to my comment I'm pretty sure that you are looking for something simple like this.

So you can have multiple search terms separated with "|". This is an alternation construct.

class Program
{
    private static List<string> searchList = new List<string>
                                     {
                                         "Travis - Sing (2001).mp3",
                                         "Travis - Sing (Edit).mp3",
                                         "Mission Impossible I - Main Theme.mp3",
                                         "Mission Impossible II - Main Theme.mp3",
                                         "doesn't match"
                                     };

    static void Main(string[] args)
    {
        var matchRegex = new Regex("Travis - Sing|Mission Impossible I");
        var matchingStrings = searchList.Where(str => matchRegex.IsMatch(str));

        foreach (var str in matchingStrings)
        {
            Console.WriteLine(str);
        }
    }
}

If you want to know what you matched against, you can add groups : 如果您想知道要匹配的内容,则可以添加

    static void Main(string[] args)
    {
        var matchRegex = new Regex("(?<travis>Travis - Sing)|(?<mi>Mission Impossible I)");

        foreach (var str in searchList)
        {
            var match = matchRegex.Match(str);
            if (match.Success)
            {
                if (match.Groups["travis"].Success)
                {
                    Console.WriteLine(String.Format("{0} matches against travis", str));
                }
                else if (match.Groups["mi"].Success)
                {
                    Console.WriteLine(String.Format("{0} matches against mi", str));
                }
            }
        }
    }

Are there always multiple spaces between the source and the target? If so, then the following will match:

/^(.*?)\s{2,}(.*?)$/

It basically matches two items, one before any gap of 2+ whitespace, and one after that gap. (The capture patterns use a non-greedy .*? so that if there's more than 2 whitespace, the extra whitespace won't get captured in either.)

The following method is a bit more robust (allows for different number of spaces or hypens between source and target). Eg target may have extra spaces between words, but it will still match.

First identify the characters that are allowed as word delimiters in your string. Then split your source and target strings into tokens using your delimiters. Then check to see if the words in your source are found as the beginning words.

Eg (Java) I have used whitespace and hyphens as delimiters

public boolean isValidMatch(String source, String target){
    String[] sourceTokens = source.split("[\\s\\-]+");  // split on sequence of 
    //whitespaces or dashes. Two dashes between words will still split 
    //same as one dash.

    String[] targetTokens = target.split("[\\s\\-]+"); // split similarly
    if(sourceTokens.length>targetTokens.length){
        return false;
    }

    for(int i=0;i<souceTokens.length;i++){
        if(!sourceTokens[i].equals(targetTokens[i])){
            return false;
        }
    }
    return true;
}

PS: You might want to add the dot '.' character as a delimiter in case you have source "Hello World" and target "Hello World.mp3"; Currently it won't match since the regex doesn't split on dot but if you expand your delimiter set to include dot, then it will.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM