简体   繁体   中英

Java regex pattern matcher - how to allow a choice?

In Java I have a block of code that processes an Apache web server log, and checks the URL extension type. It works well when the URL is in the format "/index.html", but occasionally the URL is "/", which breaks the code.

The below code works fine, but if in the input line "/index.html" is changed to "/" then it will break because line 19 (\\\\.\\\\S*) checks for a dot followed by a few characters, but if the URL is "/" there is no dot for the regex to find.

How can I rewrite line 19 (\\\\.\\\\S*) to allow for a choice of .extension or "/"?

In other words:
if URL=index.html, then extension is .html
if URL=index.php, then extension is .php
if URL=/, then extension is ""

import java.util.regex.*;

public class Test {

    public static void main(String[] args) {

        String log_input = "123.45.67.89 - - [27/Oct/2000:09:27:09 -0400] \"GET /index.html HTTP/1.0\" 200 10450 \"-\" \"Mozilla/4.6 [en] (X11; U; OpenBSD 2.8 i386; Nav)\"";             
      //String log_input = "123.45.67.89 - - [27/Oct/2000:09:27:09 -0400] \"GET / HTTP/1.0\" 200 10450 \"-\" \"Mozilla/4.6 [en] (X11; U; OpenBSD 2.8 i386; Nav)\""; 

        //step 1 - split log line
        Pattern p = Pattern.compile("^([\\d.]+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(.+)\" (\\d{3}) (\\d+) \"([^\"]+)\" \"([^\"]+)\"");
        Matcher m = p.matcher(log_input);       
        m.matches();
        String request_ip = m.group(1);
        String request_resource = m.group(5);
        System.out.println("Input: " + m.group(5));

        //step 2 - check file extension
        Pattern p2 = Pattern.compile(".* .*(\\.\\S*) .*");
        Matcher m2 = p2.matcher(request_resource);  
        m2.matches();
        String request_resource_ext = m2.group(1);
        System.out.println("Extension: " + request_resource_ext);

        if(request_resource_ext.matches("\\.htm|\\.html|\\.php|^$")){ //^$ in case the URL is / which has no extension
            System.out.println("Write");
        }else{
            System.out.println("Do not write");
        }

    }

}

Use the following regex:

.* (?:/|.*(\\.\\S*)) .*

It uses the pipe | character to match / or a file name with a dot somewhere.

?: makes the group non-capturing so that m2.group(1) continues to work as before.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM