简体   繁体   English

Java - 正则表达式彼此匹配

[英]Java - Regular Expressions matching one to another

I am trying to retrieve bits of data using RE. 我试图使用RE检索数据位。 Problem is I'm not very fluent with RE. 问题是我对RE不是很流利。 Consider the code. 考虑一下代码。

import java.util.regex.Pattern;
import java.util.regex.Matcher;

class HTTP{

    private static String getServer(httpresp){
        Pattern p = Pattern.compile("(\bServer)(.*[Server:-\r\n]"); //What RE syntax do I use here?
        Matcher m = p.matcher(httpresp);

        if (m.find()){
            return m.group(2);

    public static void main(String[] args){
        String testdata = "HTTP/1.1 302 Found\r\nServer: Apache\r\n\r\n"; //Test data

        System.out.println(getServer(testdata));

How would I get "Server:" to the next "\\r\\n" out which would output "Apache"? 如何将“Server:”输出到下一个“\\ r \\ n”输出“Apache”? I googled around and tried myself, but have failed. 我用Google搜索并尝试自己,但都失败了。

It's a one liner: 这是一个班轮:

private static String getServer(httpresp) {
    return httpresp.replaceAll(".*Server: (.*?)\r\n.*", "$1");
}

The trick here is two-part: 这里的诀窍是两部分:

  • use .*? 使用.*? , which is a reluctant match (consumes as little as possible and still match) ,这是一个不情愿的匹配(消耗尽可能少,仍然匹配)
  • regex matches whole input, but desired target captured and returned using a back reference 正则表达式匹配整个输入,但使用后引用捕获并返回所需目标

You could use capturing groups or positive lookbehind. 您可以使用捕获组或积极的lookbehind。

Pattern.compile("(?:\\bServer:\\s*)(.*?)(?=[\r\n]+)");

Then print the group index 1. 然后打印组索引1。

Example: 例:

String testdata = "HTTP/1.1 302 Found\r\nServer: Apache\r\n\r\n";
Matcher matcher = Pattern.compile("(?:\\bServer:\\s*)(.*?)(?=[\r\n]+)").matcher(testdata);
if (matcher.find())
{
    System.out.println(matcher.group(1));
}

OR 要么

Matcher matcher = Pattern.compile("(?:\\bServer\\b\\S*\\s+)(.*?)(?=[\r\n]+)").matcher(testdata);
if (matcher.find())
{
    System.out.println(matcher.group(1));
}

Output: 输出:

Apache

Explanation: 说明:

  • (?:\\\\bServer:\\\\s*) In regex, non-capturing group would be represented as (?:...) , which will do matching only. (?:\\\\bServer:\\\\s*)在正则表达式中,非捕获组将表示为(?:...) ,它将仅进行匹配。 \\b called word boundary which matches between a word character and a non-word character. \\b称为单词边界,它在单词字符和非单词字符之间匹配。 Server: matches the string Server: and the following zero or more spaces would be matched by \\s* Server:匹配字符串Server:以下零个或多个空格将与\\s*匹配

  • (.*?) In regex (..) called capturing group which captures those characters which are matched by the pattern present inside the capturing group. (.*?)在正则表达式(..)称为捕获组,它捕获与捕获组内部存在的模式匹配的那些字符。 In our case (.*?) will capture all the characters non-greedily upto, 在我们的情况下(.*?)将非贪婪地捕获所有字符,

  • (?=[\\r\\n]+) one or more line breaks are detected. (?=[\\r\\n]+)检测到一个或多个换行符。 (?=...) called positive lookahead which asserts that the match must be followed by the characters which are matched by the pattern present inside the lookahead. (?=...)称为正向前瞻,它断言匹配必须跟随前瞻内部存在的模式匹配的字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM