简体   繁体   English

Java使用正则表达式提取字段分隔的子字符串

[英]Java extract field delimited substring using regex

How do I extract programname from a syslog message using regex? 如何使用正则表达式从系统日志消息中提取程序名? I have a Java stream processing module that accepts regexs to process syslog messages. 我有一个Java流处理模块,该模块接受正则表达式来处理syslog消息。

The log line could be: 日志行可能是:

2013-10-14T22:05:29+00:00 hostname sshd[6359]: Connection closed by 192.168.1.10
2013-10-14T22:05:29+00:00 hostname sshd:3322 Connection closed by 192.168.1.10
2013-10-14T22:05:29+00:00 hostname sshd/6359 Connection closed by 192.168.1.10
2013-10-14T22:05:29+00:00 hostname sshd Connection closed by 192.168.1.10
2013-10-14T22:05:29+00:00 hostname SSHD[1133] Connection closed by 192.168.1.10
2013-10-14T22:05:29+00:00 hostname SSH.D[6359]: Connection closed by 192.168.1.10

The string extraction process should be: take the third sub-string delimited by space, and extract sub-string that ends in [ , : , / or space 字符串提取过程应为:以空格分隔第三个子字符串,然后提取以[:/或空格结尾的子字符串

So in the first four log samples, the extracted string would be sshd , the fifth SSHD and sixth SSH.D . 因此,在前四个日志样本中,提取的字符串将是sshd ,第五个SSHD和第六个SSH.D Is this possible with a regex? 正则表达式可能吗?

Edit: 编辑:

What I tried is ((?:[A-Za-z][A-Za-z0-9_.-]+)) and it seems to work but to be honest, I modified an example regex and used an online tool to tweak it till it fit my use case but I am not sure exactly how it worked. 我试过的是((?:[A-Za-z][A-Za-z0-9_.-]+)) ,但似乎可行,但老实说,我修改了一个正则表达式示例并使用了一个在线工具调整它直到适合我的用例,但是我不确定它是如何工作的。

双重split应该可以完成这项工作:

String token = data.split(" +")[2].split("[\\[:/]")[0];

Try something like this: 尝试这样的事情:

String str = line.split(" ")[2].replaceAll("(.+)(\\[|\\:|\\/).+", "$1");

Haven't tested it. 尚未测试。

The regex I think you're looking for is: 我想您正在寻找的正则表达式是:

String regex = "([^\\[:/]+).*";

.* says to match 0 or more of any character. .*表示要匹配0个或多个任何字符。 Putting a pair of parenthesis in front of the dot star ().* creates a group that can be selected from a Matcher. 在圆点星号().*前面放置一对括号,可以创建一个可以从Matcher中选择的组。 Since it is the first set of parenthesis, it is referenced by group number 1. Inside the parenthesis is an expression that matches 1 or more of the negated character class [^]+ containing the characters specified in the OP, specifically the "[", ":", and "/" characters. 由于它是第一个括号集,因此由组号1引用。括号内是一个表达式,该表达式与包含OP中指定字符的一个或多个否定字符类[^]+相匹配,特别是“ [” ,“:”和“ /”字符。

Here is an example application testing the results: 这是测试结果的示例应用程序:

package com.stackexchange.stackoverflow;

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Question19370191 {
    public static void main(String[] args) {
        String regex = "([^\\[:/]+).*";
        Pattern pattern = Pattern.compile(regex);

        List<String> lines = new ArrayList<>();
        lines.add("2013-10-14T22:05:29+00:00 hostname sshd[6359]: Connection closed by 192.168.1.10");
        lines.add("2013-10-14T22:05:29+00:00 hostname sshd:3322 Connection closed by 192.168.1.10");
        lines.add("2013-10-14T22:05:29+00:00 hostname sshd/6359 Connection closed by 192.168.1.10");
        lines.add("2013-10-14T22:05:29+00:00 hostname sshd Connection closed by 192.168.1.10");
        lines.add("2013-10-14T22:05:29+00:00 hostname SSHD[1133] Connection closed by 192.168.1.10");
        lines.add("2013-10-14T22:05:29+00:00 hostname SSH.D[6359]: Connection closed by 192.168.1.10");

        for(String line : lines) {
            String field = line.split("\\s")[2];
            String extraction = "";
            Matcher matcher = pattern.matcher(field);
            if(matcher.matches()) {
                extraction = matcher.group(1);
            }

            System.out.println(String.format("Field \"%-12s\" Extraction \"%s\"", field, extraction));
        }
    }
}

It outputs the following: 它输出以下内容:

Field "sshd[6359]: " Extraction "sshd"
Field "sshd:3322   " Extraction "sshd"
Field "sshd/6359   " Extraction "sshd"
Field "sshd        " Extraction "sshd"
Field "SSHD[1133]  " Extraction "SSHD"
Field "SSH.D[6359]:" Extraction "SSH.D"

if your example data will be exactly like you have provided: 如果您的示例数据与您提供的数据完全相同:

(?:.+?\s){2}([\w\.]+).+$

explained: 解释:

(?:.+?\\s){2} ...match up to the second space (?:.+?\\s){2} ...匹配到第二个空格

([^\\s[:/]+) ...match anything that isn't a ' ', ':' or '/' ([^\\s[:/]+) ...匹配任何不是'',':'或'/'的东西

.+$ ...match to EOL .+$ ...匹配EOL

what you want will be in captured group \\1 您想要的将在捕获的组\\1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM