简体   繁体   中英

Java extract field delimited substring using regex

How do I extract programname from a syslog message using regex? I have a Java stream processing module that accepts regexs to process syslog messages.

The log line could be:

2013-10-14T22:05:29+00:00 hostname sshd[6359]: Connection closed by 192.168.1.10
2013-10-14T22:05:29+00:00 hostname sshd:3322 Connection closed by 192.168.1.10
2013-10-14T22:05:29+00:00 hostname sshd/6359 Connection closed by 192.168.1.10
2013-10-14T22:05:29+00:00 hostname sshd Connection closed by 192.168.1.10
2013-10-14T22:05:29+00:00 hostname SSHD[1133] Connection closed by 192.168.1.10
2013-10-14T22:05:29+00:00 hostname SSH.D[6359]: Connection closed by 192.168.1.10

The string extraction process should be: take the third sub-string delimited by space, and extract sub-string that ends in [ , : , / or space

So in the first four log samples, the extracted string would be sshd , the fifth SSHD and sixth SSH.D . Is this possible with a regex?

Edit:

What I tried is ((?:[A-Za-z][A-Za-z0-9_.-]+)) and it seems to work but to be honest, I modified an example regex and used an online tool to tweak it till it fit my use case but I am not sure exactly how it worked.

双重split应该可以完成这项工作:

String token = data.split(" +")[2].split("[\\[:/]")[0];

Try something like this:

String str = line.split(" ")[2].replaceAll("(.+)(\\[|\\:|\\/).+", "$1");

Haven't tested it.

The regex I think you're looking for is:

String regex = "([^\\[:/]+).*";

.* says to match 0 or more of any character. Putting a pair of parenthesis in front of the dot star ().* creates a group that can be selected from a Matcher. Since it is the first set of parenthesis, it is referenced by group number 1. Inside the parenthesis is an expression that matches 1 or more of the negated character class [^]+ containing the characters specified in the OP, specifically the "[", ":", and "/" characters.

Here is an example application testing the results:

package com.stackexchange.stackoverflow;

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Question19370191 {
    public static void main(String[] args) {
        String regex = "([^\\[:/]+).*";
        Pattern pattern = Pattern.compile(regex);

        List<String> lines = new ArrayList<>();
        lines.add("2013-10-14T22:05:29+00:00 hostname sshd[6359]: Connection closed by 192.168.1.10");
        lines.add("2013-10-14T22:05:29+00:00 hostname sshd:3322 Connection closed by 192.168.1.10");
        lines.add("2013-10-14T22:05:29+00:00 hostname sshd/6359 Connection closed by 192.168.1.10");
        lines.add("2013-10-14T22:05:29+00:00 hostname sshd Connection closed by 192.168.1.10");
        lines.add("2013-10-14T22:05:29+00:00 hostname SSHD[1133] Connection closed by 192.168.1.10");
        lines.add("2013-10-14T22:05:29+00:00 hostname SSH.D[6359]: Connection closed by 192.168.1.10");

        for(String line : lines) {
            String field = line.split("\\s")[2];
            String extraction = "";
            Matcher matcher = pattern.matcher(field);
            if(matcher.matches()) {
                extraction = matcher.group(1);
            }

            System.out.println(String.format("Field \"%-12s\" Extraction \"%s\"", field, extraction));
        }
    }
}

It outputs the following:

Field "sshd[6359]: " Extraction "sshd"
Field "sshd:3322   " Extraction "sshd"
Field "sshd/6359   " Extraction "sshd"
Field "sshd        " Extraction "sshd"
Field "SSHD[1133]  " Extraction "SSHD"
Field "SSH.D[6359]:" Extraction "SSH.D"

if your example data will be exactly like you have provided:

(?:.+?\s){2}([\w\.]+).+$

explained:

(?:.+?\\s){2} ...match up to the second space

([^\\s[:/]+) ...match anything that isn't a ' ', ':' or '/'

.+$ ...match to EOL

what you want will be in captured group \\1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM