简体   繁体   中英

Java - Regular Expressions matching one to another

I am trying to retrieve bits of data using RE. Problem is I'm not very fluent with RE. Consider the code.

import java.util.regex.Pattern;
import java.util.regex.Matcher;

class HTTP{

    private static String getServer(httpresp){
        Pattern p = Pattern.compile("(\bServer)(.*[Server:-\r\n]"); //What RE syntax do I use here?
        Matcher m = p.matcher(httpresp);

        if (m.find()){
            return m.group(2);

    public static void main(String[] args){
        String testdata = "HTTP/1.1 302 Found\r\nServer: Apache\r\n\r\n"; //Test data

        System.out.println(getServer(testdata));

How would I get "Server:" to the next "\\r\\n" out which would output "Apache"? I googled around and tried myself, but have failed.

It's a one liner:

private static String getServer(httpresp) {
    return httpresp.replaceAll(".*Server: (.*?)\r\n.*", "$1");
}

The trick here is two-part:

  • use .*? , which is a reluctant match (consumes as little as possible and still match)
  • regex matches whole input, but desired target captured and returned using a back reference

You could use capturing groups or positive lookbehind.

Pattern.compile("(?:\\bServer:\\s*)(.*?)(?=[\r\n]+)");

Then print the group index 1.

Example:

String testdata = "HTTP/1.1 302 Found\r\nServer: Apache\r\n\r\n";
Matcher matcher = Pattern.compile("(?:\\bServer:\\s*)(.*?)(?=[\r\n]+)").matcher(testdata);
if (matcher.find())
{
    System.out.println(matcher.group(1));
}

OR

Matcher matcher = Pattern.compile("(?:\\bServer\\b\\S*\\s+)(.*?)(?=[\r\n]+)").matcher(testdata);
if (matcher.find())
{
    System.out.println(matcher.group(1));
}

Output:

Apache

Explanation:

  • (?:\\\\bServer:\\\\s*) In regex, non-capturing group would be represented as (?:...) , which will do matching only. \\b called word boundary which matches between a word character and a non-word character. Server: matches the string Server: and the following zero or more spaces would be matched by \\s*

  • (.*?) In regex (..) called capturing group which captures those characters which are matched by the pattern present inside the capturing group. In our case (.*?) will capture all the characters non-greedily upto,

  • (?=[\\r\\n]+) one or more line breaks are detected. (?=...) called positive lookahead which asserts that the match must be followed by the characters which are matched by the pattern present inside the lookahead.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM