简体   繁体   English

Java中的模式和Matcher正则表达式

[英]Pattern and Matcher regex in Java

I'm having a problem using Pattern and Matcher in Java.. 我在Java中使用模式和匹配器时遇到问题。

I'm trying to use it to extract two numbers from a String containing at least .SXXEXX. 我试图用它从至少包含.SXXEXX的字符串中提取两个数字。 where XX is the int's I want to extract. XX是我要提取的int。 Any regex/java pro wanna help me out? 任何正则表达式/ Java专业人士想帮助我吗?

This is my best try to do it, but it causes a runtime exception.. :( 这是我最好的尝试,但是会导致运行时异常.. :(

String s = "Some.Cool.Series.S06E01.720p.HDTV.X264-Pewpew";
Pattern p = Pattern.compile("[^.S].S(\\d2)E(\\d+2)\\p{Alpha}");
Matcher m = p.matcher(s);
String season = m.group(0);
String episode = m.group(1);

Your regex is wrong and you need to call find or matches method before accessing groups: 您的正则表达式有误,您需要在访问组之前调用findmatches方法:

String s = "Some.Cool.Series.S06E01.720p.HDTV.X264-Pewpew";
Pattern p = Pattern.compile("\\.S(\\d{2})E(\\d{2})\\.");
Matcher m = p.matcher(s);
if (m.find() {
   String season = m.group(1);
   String episode = m.group(2);
}

First of all despite the fact that your regular expression had mistakes, hence the reason it could not compile, I must congratulate you for a valiant effort. 首先,尽管您的正则表达式有错误,因此无法编译,但我必须祝贺您的英勇努力。 It is difficult to get a regex 100% right from the beginning, even for cases that look innocuous and straightforward. 即使对于看起来无害且直接的案例,也很难从一开始就获得正则表达式100%。 With minor corrections you can modify it to extract the desired information from your strings, assuming that the delimiters are dots '.' 假设分隔符为点“。”,则可以通过较小的更正对其进行修改,以从字符串中提取所需的信息。 as in your example and the season and episode are given in the exact SXXEXX format. 如您的示例中所示,季节和情节均以确切的SXXEXX格式给出。 Here is the corrected version of the pattern: "\\\\.S(\\\\d{2})E(\\\\d{2})\\\\." 这是模式的更正版本: "\\\\.S(\\\\d{2})E(\\\\d{2})\\\\."

You can access the captured groups by calling m.group(1) and m.group(2) respectively for season and episode. 您可以分别通过调用季节和情节的m.group(1)m.group(2)来访问捕获的组。 Quoting from the java.util.regex.Matcher javadoc : java.util.regex.Matcher javadoc引用:

Capturing groups are indexed from left to right, starting at one. 捕获组从左到右从一个索引开始。 Group zero denotes the entire pattern, so the expression m.group(0) is equivalent to m.group(). 组零表示整个模式,因此表达式m.group(0)等效于m.group()。

In order to enhance the pedagogic paradigm, I have written a singleton (only one instance is possible) that has been engineered according to the Effective Java advice on p.17, (Bloch J., 2nd ed., 2008). 为了增强教学范式,我写了一个单例(仅一个实例),该单例是根据p.17上的Effective Java建议进行设计的(Bloch J.,第2版,2008年)。 The instance of the class, which is accessed with the getInstance() method, exposes the parse() method which takes a string containing the series information you seek to extract and parses it, saving the season and episode numbers to the respective private integers fields. 使用getInstance()方法访问的类的实例公开了parse()方法,该方法采用一个字符串,其中包含您要提取和解析的系列信息,并将季节和剧集号保存到各自的私有整数字段中。 Finally as a test we try to parse an array of challenging episode names from various (fictional) series - including your own example - and see if we can get the number of season and episode. 最后,作为测试,我们尝试分析各种(虚构)系列中具有挑战性的情节名称的数组-包括您自己的示例-看看我们能否获得季节和情节的数量。 IMHO this example illustrates in a succinct way not only a broader version of what you are trying to achieve, but also: 恕我直言,此示例以简洁的方式说明了您要实现的目标,不仅是其广泛的版本,而且:

  1. an effective approach to using repeatedly a compiled pattern 重复使用编译模式的有效方法
  2. a less restrictive pattern than the one you were trying to match (eg "S", "s", "Season", "SEASON", "season" are all acceptable variants for matching the season keyword) 与您尝试匹配的模式相比,限制程度较小(例如,“ S”,“ s”,“季节”,“季节”,“季节”都是可以匹配季节关键字的可接受变体)
  3. how to use lookarounds and word boundaries (?<= and (?= and \\b 如何使用环视和单词边界(?<=(?=\\b
  4. how to use named capturing groups using the (?<name>X) syntax (caveat: must use Java 7 or later, see this older question for more information) 如何使用(?<name>X)语法使用命名的捕获组(注意:必须使用Java 7或更高版本,有关更多信息,请参阅此较早的问题
  5. interesting cases of how to use the Pattern and Matcher classes respectively. 有关如何分别使用PatternMatcher类的有趣案例。 You can also take a look in this very educational tutorial from Oracle The Java Tutorials: Regular Expressions 您也可以在Oracle The Java Tutorials:Regular Expressions中这个非常有教育意义的教程中进行浏览。
  6. how to create and use singletons 如何创建和使用单例

// Class begin //上课开始

public class SeriesInfoMatcher {

    private int season, episode;
    static final String SEASON_EPISODE_PATTERN = "(?<=\\b|_) s(?:eason)? (?<season>\\d+) e(?:pisode)? (?<episode>\\d+) (?=\\b|_)";
    private final Pattern pattern = Pattern.compile(SEASON_EPISODE_PATTERN, Pattern.CASE_INSENSITIVE | Pattern.COMMENTS);
    private Matcher matcher;
    private String seriesInfoString;
    private static SeriesInfoMatcher instance;

    private SeriesInfoMatcher() {
        resetFields();
    }

    public static SeriesInfoMatcher getInstance() {
        return instance == null ? new SeriesInfoMatcher() : instance;
    }

    /**
     * Analyzes a string containing series information and updates the internal fields accordingly
     * @param unparsedSeriesInfo The string containing episode and season numbers to be extracted. Must not be null or empty.
     */
    public void parse (String unparsedSeriesInfo) {
        try {
            if (unparsedSeriesInfo == null || unparsedSeriesInfo.isEmpty()) {
                throw new IllegalArgumentException("String argument must be non-null and non-empty!");
            }
            seriesInfoString = unparsedSeriesInfo;
            initMatcher();
            while (matcher.find()) {
                season = Integer.parseInt ( matcher.group("season") );
                episode = Integer.parseInt( matcher.group("episode"));
            }
        }
        catch (Exception ex) {
            resetFields();
            System.err.printf("Invalid movie info string format. Make sure there is a substring of \"%s\" format.%n%s", "S{NUMBER}E{NUMBER}", ex.getMessage());
        }
    }

    private void initMatcher() {
        if (matcher == null) {
            matcher = pattern.matcher(seriesInfoString);
        }
        else {
            matcher.reset(seriesInfoString);
        }
    }

    private void resetFields() {
        seriesInfoString = "";
        season = -1;
        episode = -1;
    }

    @Override
    public String toString() {
        return seriesInfoString.isEmpty() ? 
            "<no information to display>": 
            String.format("{\"%s\": %d, \"%s\": %d}", "season", season, "episode", episode);
    }

    public static void main(String[] args){
        // Example movie info strings
        String[] episodesFromVariousSeries = {
            "Some.Cool.Series.S06E01.720p.HDTV.X264-Pewpew",
            "Galactic Wars - S01E02 - A dire development",
            "A.dotted.hell.season3episode15.when.enough.is.enough.XVID",
            "The_underscore_menace_-_The_horror_at_the_end!_[2012]_s05e02",
            "s05e01_-_The_underscore_menace_-_Terror_at_the_beginning_[2012]"
        };
        SeriesInfoMatcher seriesMatcher = new SeriesInfoMatcher();
        System.out.printf( "%-80s %-20s%n", "Episode Info", "Parsing Results" );
        for (String episode: episodesFromVariousSeries) {
            seriesMatcher.parse(episode);
            System.out.printf( "%-80s %-20s%n", episode, seriesMatcher );
        }
    }
}

The output of the main() is: main()的输出是:

Episode Info                                                                     Parsing Results     
Some.Cool.Series.S06E01.720p.HDTV.X264-Pewpew                                    {"season": 6, "episode": 1}
Galactic Wars - S01E02 - A dire development                                      {"season": 1, "episode": 2}
A.dotted.hell.season3episode15.when.enough.is.enough.XVID                        {"season": 3, "episode": 15}
The_underscore_menace_-_The_horror_at_the_end!_[2012]_s05e02                     {"season": 5, "episode": 2}
s05e01_-_The_underscore_menace_-_Terror_at_the_beginning_[2012]                  {"season": 5, "episode": 1}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM