简体   繁体   中英

Logstash Grok match to last index unti begin of UserAgent

I have this log message:

"sid-cmascioieiow89322&New*Sou,th%20Skvn%20and%20ir&o,n%20Age,Mozilla/5.0 (Linux; Android 6.0; CHM-U01 Build/HonorCHM-U01) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.98 Mobile Safari/537.36"

And the pattern:

"(?[^&])&(?[^,]),%{GREEDYDATA:User_Agent}"

The problem is p2 sometimes contains zero or one or more then one comma. I want to match to the last comma before UserAgent because UserAgent some time contains commas.

This is the grok debugger link: https://grokdebug.herokuapp.com/

Now:

{
    "p1": [
        "sid-cmascioieiow89322"
    ],
    "p2": [
        "New*Sou"
    ],
    "User_Agent": [
        "th%20Skvn%20and%20iro,n%20Age,Mozilla/5.0 (Linux; Android 6.0; CHM-U01 Build/HonorCHM-U01) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.98 Mobile Safari/537.36"
    ]
}

I want like this:

{
    "p1": [
        "sid-cmascioieiow89322"
    ],
    "p2": [
        "New*Sou,th%20Skvn%20and%20ir&o,n%20Age"
    ],
    "User_Agent": [
        "Mozilla/5.0 (Linux; Android 6.0; CHM-U01 Build/HonorCHM-U01) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.98 Mobile Safari/537.36"
    ]
}

Thank you for your help.

The part of string that you want to capture into p2 part has no whitespaces. Thus, instead of a [^,]* pattern that matches any zero or more chars other than , you may use \\S* - any 0+ non-whitespace chars as many as possible, thus \\S*, will match the comma that is the last in the streak of non-whitespace chars.

(?<p1>[^&]*)&(?<p2>\S*),%{GREEDYDATA:User_Agent}
             ^^^^^^^^^^

This is how this regex matches your log data: 在此处输入图片说明

See the Grok demo screenshot: 在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM