简体   繁体   English

如何用正则表达式匹配此字符串?

[英]How to match this strings with Regex?

Basically I have music filenames such as: 基本上我有音乐文件名,例如:

<source> <target>

"Travis - Sing"   "Travis - Sing 2001.mp3"
"Travis - Sing"   "Travis - Sing Edit.mp3"
"Travis - Sing"   "Travis - Sing New Edit.mp3"
"Mission Impossible I"   "Mission Impossible I - Main Theme.mp3"
"Mission Impossible I"   "Mission Impossible II - Main Theme.mp3"
"Mesrine - Death Instinct"   "Mesrine - Death Instinct - Le Million.mp3"
"Mesrine - Public Enemy #1"   "Mesrine - Public Enemy #1 - Theme"
"Se7en"   "Se7en Motion Picture Soundtrack - Theme.mp3"

Parenthesis aren't included in the strings (just for demonstration). 字符串中不包括括号(仅用于演示)。

and I am trying to match the "source" to "target" values. 并且我正在尝试将“源”与“目标”值进行匹配。

So the source names I already have, but right now I am using alot of string parsing to be able to match the two. 因此,我已经有了源名称,但是现在我正在使用大量的字符串解析功能,以便能够将两者匹配。 How can I achieve the same using Regex? 如何使用Regex实现相同目的?

EDIT: It seems like there is a confusion. 编辑:似乎有一个混乱。

"Travis - Sing" is my source string, and I am trying to match it to: "Travis - Sing"是我的源字符串,我正在尝试将其匹配为:

"Travis - Sing (2001).mp3"
"Travis - Sing (Edit).mp3"
"Travis - Sing (New Edit).mp3"

EDIT2: Removed the parenthesis. EDIT2:删除了括号。

It seems you're looking for all files that begin with a certain string - this will answer all of your examples. 似乎您正在寻找所有以特定字符串开头的文件-这将回答所有示例。 This can be achieved easily without regular expressions using two loops, or using linq: 无需使用两个循环或使用linq的正则表达式,即可轻松实现这一点:

var matches = from source in sources
              select new
                      {
                          Source = source,
                          Targets = from file in targets
                                    where file.StartsWith(source)
                                    select file
                      };

You can also use a regex instead of the StartsWith condition, for example: 您还可以使用正则表达式代替StartsWith条件,例如:

where Regex.IsMatch(file, String.Format("^{0}", source), RegexOptions.IgnoreCase)

This can probably be optimized in many ways, but Andrew suggests writing a long pattern, which isn't quicker when done dynamically. 这可能可以通过许多方式进行优化,但是安德鲁建议编写一个长模式,动态完成并不会很快。

From your answer to my comment I'm pretty sure that you are looking for something simple like this. 从您的回答到我的评论,我很确定您正在寻找类似这样的简单内容。

So you can have multiple search terms separated with "|". 因此,您可以将多个搜索词用“ |”分隔。 This is an alternation construct. 这是一个替代结构。

class Program
{
    private static List<string> searchList = new List<string>
                                     {
                                         "Travis - Sing (2001).mp3",
                                         "Travis - Sing (Edit).mp3",
                                         "Mission Impossible I - Main Theme.mp3",
                                         "Mission Impossible II - Main Theme.mp3",
                                         "doesn't match"
                                     };

    static void Main(string[] args)
    {
        var matchRegex = new Regex("Travis - Sing|Mission Impossible I");
        var matchingStrings = searchList.Where(str => matchRegex.IsMatch(str));

        foreach (var str in matchingStrings)
        {
            Console.WriteLine(str);
        }
    }
}

EDIT If you want to know what you matched against, you can add groups : 编辑如果您想知道要匹配的内容,则可以添加

    static void Main(string[] args)
    {
        var matchRegex = new Regex("(?<travis>Travis - Sing)|(?<mi>Mission Impossible I)");

        foreach (var str in searchList)
        {
            var match = matchRegex.Match(str);
            if (match.Success)
            {
                if (match.Groups["travis"].Success)
                {
                    Console.WriteLine(String.Format("{0} matches against travis", str));
                }
                else if (match.Groups["mi"].Success)
                {
                    Console.WriteLine(String.Format("{0} matches against mi", str));
                }
            }
        }
    }

Are there always multiple spaces between the source and the target? 源和目标之间是否总是存在多个空格? If so, then the following will match: 如果是这样,则将符合以下条件:

/^(.*?)\s{2,}(.*?)$/

It basically matches two items, one before any gap of 2+ whitespace, and one after that gap. 它基本上匹配两个项目,一个匹配2个以上空白之间的空白,另一个匹配该空白之后的空白。 (The capture patterns use a non-greedy .*? so that if there's more than 2 whitespace, the extra whitespace won't get captured in either.) (捕获模式使用非贪婪的.*?因此,如果有两个以上的空格,则不会在任何一个中捕获多余的空格。)

The following method is a bit more robust (allows for different number of spaces or hypens between source and target). 以下方法更加健壮(允许在源和目标之间使用不同数量的空格或连字符)。 Eg target may have extra spaces between words, but it will still match. 例如,目标词之间可能有多余的空格,但仍会匹配。

First identify the characters that are allowed as word delimiters in your string. 首先,确定字符串中允许用作单词定界符的字符。 Then split your source and target strings into tokens using your delimiters. 然后使用分隔符将源字符串和目标字符串拆分为标记。 Then check to see if the words in your source are found as the beginning words. 然后检查是否在您的来源中找到了单词作为开始单词。

Eg (Java) I have used whitespace and hyphens as delimiters 例如(Java),我使用空格和连字符作为分隔符

public boolean isValidMatch(String source, String target){
    String[] sourceTokens = source.split("[\\s\\-]+");  // split on sequence of 
    //whitespaces or dashes. Two dashes between words will still split 
    //same as one dash.

    String[] targetTokens = target.split("[\\s\\-]+"); // split similarly
    if(sourceTokens.length>targetTokens.length){
        return false;
    }

    for(int i=0;i<souceTokens.length;i++){
        if(!sourceTokens[i].equals(targetTokens[i])){
            return false;
        }
    }
    return true;
}

PS: You might want to add the dot '.' PS:您可能要添加点“。” character as a delimiter in case you have source "Hello World" and target "Hello World.mp3"; 如果您有源“ Hello World”和目标“ Hello World.mp3”,请将该字符用作分隔符; Currently it won't match since the regex doesn't split on dot but if you expand your delimiter set to include dot, then it will. 当前它不匹配,因为正则表达式不会在点上分割,但是如果您将定界符集扩展为包括点,那么它将匹配。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM