简体   繁体   中英

Regex capture single entry from log file when entry contains multiple line breaks

I'm attempting to parse an Elasticsearch log file, but I'm getting hung up. I can't seem to get it to break before the next line entry. Here's a snippet of the log (I trimmed it down signifcantly):

[2014-09-03 07:47:40,088][DEBUG][action.search.type       ] [Server1] [my_index_name123][1], node[JLNzpIU9QRikVkfQyIFjCA], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@769a0c28] lastShard [true]
org.elasticsearch.search.SearchParseException: [my_index_name123][1]: from[0],size[10],sort[<custom:"time_generated": org.elasticsearch.index.fielddata.fieldcomparator.LongValuesComparatorSource@7d4bfe4>!]: Parse Failure [Failed to parse source [{"from":0,"size":10,"sort":[{"time_generated":{"order":"desc"}}],"query":{"filtered":{"query":{"bool":{"must":[{"query_string":{"fields":["event_message"],"default_operator":"AND","query":null}}]}}}},"aggs":{"events_over_time":{"filter":{"range":{"time_generated":{"from":"2014-08-05T12:47:41.000Z","to":"2014-09-03T12:47:41.000Z"}}},"aggs":{"timeline":{"date_histogram":{"field":"time_generated","interval":"day"}}}}}}]]
    at org.elasticsearch.search.SearchService.parseSource(SearchService.java:634)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.query.QueryParsingException: [my_index_name123] query_string must be provided with a [query]
    at org.elasticsearch.index.query.QueryStringQueryParser.parse(QueryStringQueryParser.java:203)
    at org.elasticsearch.search.SearchService.parseSource(SearchService.java:622)
    ... 11 more
[2014-09-03 07:47:40,088][DEBUG][action.search.type       ] [Server1] [my_index_name123][0], node[JLNzpIU9QRikVkfQyIFjCA], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@769a0c28]
org.elasticsearch.search.SearchParseException: [my_index_name123][0]: from[0],size[10],sort[<custom:"time_generated": org.elasticsearch.index.fielddata.fieldcomparator.LongValuesComparatorSource@6392f9f7>!]: Parse Failure [Failed to parse source [{"from":0,"size":10,"sort":[{"time_generated":{"order":"desc"}}],"query":{"filtered":{"query":{"bool":{"must":[{"query_string":{"fields":["event_message"],"default_operator":"AND","query":null}}]}}}},"aggs":{"events_over_time":{"filter":{"range":{"time_generated":{"from":"2014-08-05T12:47:41.000Z","to":"2014-09-03T12:47:41.000Z"}}},"aggs":{"timeline":{"date_histogram":{"field":"time_generated","interval":"day"}}}}}}]]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.query.QueryParsingException: [my_index_name123] query_string must be provided with a [query]
    at org.elasticsearch.index.query.QueryStringQueryParser.parse(QueryStringQueryParser.java:203)
    at org.elasticsearch.search.SearchService.parseSource(SearchService.java:622)
    ... 11 more
[2014-09-03 07:47:40,088][DEBUG][action.search.type       ] [Server1] All shards failed for phase: [query]
[2014-09-03 07:47:40,088][DEBUG][action.search.type       ] [Server1] [my_index_name123][4], node[JLNzpIU9QRikVkfQyIFjCA], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@769a0c28] lastShard [true]
org.elasticsearch.search.SearchParseException: [my_index_name123][4]: from[0],size[10],sort[<custom:"time_generated": org.elasticsearch.index.fielddata.fieldcomparator.LongValuesComparatorSource@6cb38cdb>!]: Parse Failure [Failed to parse source [{"from":0,"size":10,"sort":[{"time_generated":{"order":"desc"}}],"query":{"filtered":{"query":{"bool":{"must":[{"query_string":{"fields":["event_message"],"default_operator":"AND","query":null}}]}}}},"aggs":{"events_over_time":{"filter":{"range":{"time_generated":{"from":"2014-08-05T12:47:41.000Z","to":"2014-09-03T12:47:41.000Z"}}},"aggs":{"timeline":{"date_histogram":{"field":"time_generated","interval":"day"}}}}}}]]
    at org.elasticsearch.search.SearchService.parseSource(SearchService.java:634)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.query.QueryParsingException: [my_index_name123] query_string must be provided with a [query]
    at org.elasticsearch.search.query.QueryParseElement.parse(QueryParseElement.java:33)
    at org.elasticsearch.search.SearchService.parseSource(SearchService.java:622)
    ... 11 more

Here's a regex pattern I've been testing:

(?<=(\[\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d{3}\]))([\d\s\p{L}\p{P}\p{S}]+)(?=\n\[\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d{3}\])

The pattern matches every log entry but the last one. I assume that's because of the lookahead on the end. I've been staring at this for a while, but I can't seem to come up with a way to just get it to match a single entry. I'm using PCRE. Hopefully someone has stronger regex-fu than me.

I see two changes that need to be made.

1) make the capture group quantifier lazy so it only goes up to the next timestamp

([\d\s\p{L}\p{P}\p{S}]+?)

2) add an alternation to the lookahead to match a possible end of string

(?=(\n\[\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d{3}\]|$))

Here's the full regex:

(?<=(\[\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d{3}\]))([\d\s\p{L}\p{P}\p{S}]+?)(?=(\n\[\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d{3}\]|$))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM