简体   繁体   中英

How to get the file name part from HTML src attribute of <script> tag using Regex pattern in Java

I need to get the file name from the src attribute of HTML 'script' tag. I managed to get the value for entire src attribute but not sure how to get only file name including extension. Below is the code with example.

        String javaScript = "<script src=\"https://www.xxx.co.uk/rta2/v-0.52.min.js\" class=\"RTA2-loader\" data-hosts=\"ted.xxx.co.uk\"></script>";

        Pattern scriptPattern = Pattern.compile("<script[^>]+src\\s*=\\s*[\"'](.*?)[\"'][^>]*>");

        Matcher script = scriptPattern.matcher(javaScript);
        if (script.find()) {
            System.out.println(script.group(1));
        }

The above one prints https://www.xxx.co.uk/rta2/v-0.52.min.js

Instead of entire URL I want the file name ie

v-0.52.min.js

Also it should support '/' and '\\' path separator.

Please help.

String javaScript = "<script src=\"https://www.xxx.co.uk/rta2/v-0.52.min.js\" class=\"RTA2-loader\" data-hosts=\"ted.xxx.co.uk\"></script>";
Pattern pattern = Pattern.compile("<script src=\"[^\"]+(?:/|\\\\)([^\"]+)\"");
Matcher matcher = pattern.matcher(javaScript);
if (matcher.find()) {
    String src = matcher.group(1);
    System.out.println(src);
}

The regular expression searches for the literal string <script src=
followed by a single double quote character, ie "
followed by one or more characters that are not the double quote character
followed by either a single forward slash, ie / , or a single backslash, ie \\
again followed by one or more characters that are not the double quote character (and these characters are placed in a capturing group)
and finally followed by another double quote character.

The above code displays the following:

v-0.52.min.js

Nonetheless, I wish to point out that using a HTML parser is preferred over regular expressions when it comes to parsing HTML.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM