简体   繁体   中英

Regex to match movie file

I was trying to write some regex to match the title of a movie from a file. The regex should match the title from all the example files. I can only get it to work for some of them currently with this regex ^(.+).(\d{4}p) .
I am using this in Java from the package java.util.regex

I would like it to work when the movie file format is:

  • {title} {year} {resolution} etc.
  • {title} {resolution} {year} etc.
  • {title} {resolution} etc.
  • {title} {year} etc.
  • when the movie contains a year or is just a year like the movie: 2012 (2009)

Example files:

Film.2017.720p.BluRay.H264.AAC.mp4
Film.And.The.Film.2017.1080p.BluRay.x264.mp4
152.Seconds.2010.1080p.BluRay.x264.mp4
2015.2005.1080p.BluRay.x264.mp4

Java code:

public static void main(String[] args)
{
    ArrayList<String> movies = new ArrayList<>();
    movies.add("Film.2017.720p.BluRay.H264.AAC.mp4");
    movies.add("Film.And.The.Film.2017.1080p.BluRay.x264.mp4");
    movies.add("152.Seconds.2010.1080p.BluRay.x264.mp4");
    movies.add("2015.2005.1080p.BluRay.x264.mp4");

    for (String s : movies)
    {
        System.out.println("original file: \t" + s);
        System.out.println("new file: \t\t" + getTitleFromFile(s) + "\n");
    }
}
    

private static String getTitleFromFile(String fileName)
{
    Pattern pattern = Pattern.compile("^(.+).(\\d{4}p)");
    Matcher m = pattern.matcher(fileName);

    if (m.find())
    {
        return m.group();
    }
    else
    {
        return null;
    }
}

Actual Output:

original file:  Film.2017.720p.BluRay.H264.AAC.mp4
new file:       null

original file:  Film.And.The.Film.2017.1080p.BluRay.x264.mp4
new file:       null

original file:  Film 2015 1080p BluRay x264 DTS.mp4
new file:       Film 2015 1080p

original file:  Film.1080p.BrRip.x264.mp4
new file:       Film.1080p

Expected Output:

original file:  Film.2017.720p.BluRay.H264.AAC.mp4
new file:       Film

original file:  Film.And.The.Film.2017.1080p.BluRay.x264.mp4
new file:       Film And The Film

original file:  Film 2015 1080p BluRay x264 DTS.mp4
new file:       Film

original file:  Film.1080p.BrRip.x264.mp4
new file:       Film

You may use

^(.*?)\W(?:(\d{4})(?:\W(\d+p)?)|(\d+p)(?:\W(\d{4}))?)\b

See the regex demo .

Details

  • ^ - start of string
  • (.*?) - Group 1: name, any 0 or more chars other than line break chars, as few as possible
  • \W - a non-word char
  • (?:(\d{4})(?:\W(\d+p)?)|(\d+p)(?:\W(\d{4}))?) - either of
    • (\d{4})(?:\W(\d+p)?) - Group 2 - four digits followed with an optional group matching a non-word char and then one or more digits and p captured in Group 3
    • | - or
    • (\d+p)(?:\W(\d{4}))? - Group 4 - one or more digits and p followed with an optional group matching a non-word char and then four digits captured in Group 5
  • \b - word boundary

Java demo:

List<String> strs = Arrays.asList("Film.The.Film.720p.BrRip.x264.BOKUTOX.mp4",
         "Film.The.Film.2020.BrRip.x264.mp4",
         "Film.The.Film.720p.2020.BrRip.x264.mp4", 
         "Film.The.Film.720p.BrRip.x264.mp4");
Pattern p = Pattern.compile("^(.*?)\\W(?:(\\d{4})(?:\\W(\\d+p)?)|(\\d+p)(?:\\W(\\d{4}))?)\\b");
for (String str : strs) {
    Matcher m = p.matcher(str);
    if (m.find()) {
        System.out.println("\n--------\nName: " + m.group(1).replace(".", " "));
        if (m.group(2) != null) {
            System.out.println("Year: " + m.group(2));
            if (m.group(3) != null) {
                System.out.println("Resolution: " + m.group(3));
            }
        }
        else {
            System.out.println("Resolution: " + m.group(4));
            if (m.group(5) != null) {
                System.out.println("Year: " + m.group(5));
            }
        }
    }
}

Output:

--------
Name: Film The Film
Year: 2004
Resolution: 720p

--------
Name: Film The Film
Year: 2020

--------
Name: Film The Film
Resolution: 720p
Year: 2020

--------
Name: Film The Film
Resolution: 720p

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM