I was trying to write some regex to match the title of a movie from a file. The regex should match the title from all the example files. I can only get it to work for some of them currently with this regex ^(.+).(\d{4}p)
.
I am using this in Java from the package java.util.regex
I would like it to work when the movie file format is:
Example files:
Film.2017.720p.BluRay.H264.AAC.mp4
Film.And.The.Film.2017.1080p.BluRay.x264.mp4
152.Seconds.2010.1080p.BluRay.x264.mp4
2015.2005.1080p.BluRay.x264.mp4
Java code:
public static void main(String[] args)
{
ArrayList<String> movies = new ArrayList<>();
movies.add("Film.2017.720p.BluRay.H264.AAC.mp4");
movies.add("Film.And.The.Film.2017.1080p.BluRay.x264.mp4");
movies.add("152.Seconds.2010.1080p.BluRay.x264.mp4");
movies.add("2015.2005.1080p.BluRay.x264.mp4");
for (String s : movies)
{
System.out.println("original file: \t" + s);
System.out.println("new file: \t\t" + getTitleFromFile(s) + "\n");
}
}
private static String getTitleFromFile(String fileName)
{
Pattern pattern = Pattern.compile("^(.+).(\\d{4}p)");
Matcher m = pattern.matcher(fileName);
if (m.find())
{
return m.group();
}
else
{
return null;
}
}
Actual Output:
original file: Film.2017.720p.BluRay.H264.AAC.mp4
new file: null
original file: Film.And.The.Film.2017.1080p.BluRay.x264.mp4
new file: null
original file: Film 2015 1080p BluRay x264 DTS.mp4
new file: Film 2015 1080p
original file: Film.1080p.BrRip.x264.mp4
new file: Film.1080p
Expected Output:
original file: Film.2017.720p.BluRay.H264.AAC.mp4
new file: Film
original file: Film.And.The.Film.2017.1080p.BluRay.x264.mp4
new file: Film And The Film
original file: Film 2015 1080p BluRay x264 DTS.mp4
new file: Film
original file: Film.1080p.BrRip.x264.mp4
new file: Film
You may use
^(.*?)\W(?:(\d{4})(?:\W(\d+p)?)|(\d+p)(?:\W(\d{4}))?)\b
See the regex demo .
Details
^
- start of string (.*?)
- Group 1: name, any 0 or more chars other than line break chars, as few as possible \W
- a non-word char (?:(\d{4})(?:\W(\d+p)?)|(\d+p)(?:\W(\d{4}))?)
- either of
(\d{4})(?:\W(\d+p)?)
- Group 2 - four digits followed with an optional group matching a non-word char and then one or more digits and p
captured in Group 3 |
- or (\d+p)(?:\W(\d{4}))?
- Group 4 - one or more digits and p
followed with an optional group matching a non-word char and then four digits captured in Group 5 \b
- word boundary Java demo:
List<String> strs = Arrays.asList("Film.The.Film.720p.BrRip.x264.BOKUTOX.mp4",
"Film.The.Film.2020.BrRip.x264.mp4",
"Film.The.Film.720p.2020.BrRip.x264.mp4",
"Film.The.Film.720p.BrRip.x264.mp4");
Pattern p = Pattern.compile("^(.*?)\\W(?:(\\d{4})(?:\\W(\\d+p)?)|(\\d+p)(?:\\W(\\d{4}))?)\\b");
for (String str : strs) {
Matcher m = p.matcher(str);
if (m.find()) {
System.out.println("\n--------\nName: " + m.group(1).replace(".", " "));
if (m.group(2) != null) {
System.out.println("Year: " + m.group(2));
if (m.group(3) != null) {
System.out.println("Resolution: " + m.group(3));
}
}
else {
System.out.println("Resolution: " + m.group(4));
if (m.group(5) != null) {
System.out.println("Year: " + m.group(5));
}
}
}
}
Output:
--------
Name: Film The Film
Year: 2004
Resolution: 720p
--------
Name: Film The Film
Year: 2020
--------
Name: Film The Film
Resolution: 720p
Year: 2020
--------
Name: Film The Film
Resolution: 720p
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.