简体   繁体   English

从 Python 的日志文件中提取特定的 JSON

[英]Extract specific JSON from log file in Python

I am trying to extract particular JSON from log file which contains multiple JSON and normal text, in this case I am trying to extract JSON containing "Output payload" text.我正在尝试从包含多个 JSON 和普通文本的日志文件中提取特定的 JSON,在这种情况下,我试图提取包含“输出有效负载”文本的 JSON。 I have tried multiple ways but not able to extract required JSON, the file is in the format:我尝试了多种方法,但无法提取所需的 JSON,文件格式为:

[2020-05-17 15:32:11.698000] INFO [worker-1] org.mule.api.processor.LoggerMessageProcessor [[cloudhub-us-claim-services-1-0-0-prod].post:/claims/{claimNumber}/predictionScores:experience-claims-predictionscore-api.config.7.771]: PredictionScoreAPILogger-7c506940-987d-11ea-9ef4-0a5226a8e24f:16634746: Initialization: Request successfully logged to mirror queue
[2020-05-17 15:32:12.190000] INFO [worker-1] org.mule.transformer.simple.MessagePropertiesTransformer [[cloudhub-us-claim-services-1-0-0-prod].experience-claims-predictionscore-api.prediction-details-claim-updates.stage1.839]: Property with key 'response', not found on message using 'null'. Since the value was marked optional, nothing was set on the message for this property
[2020-05-17 15:32:12.192000] DEBUG [worker-1] aiml.logging.debug [[cloudhub-us-claim-services-1-0-0-prod].experience-claims-predictionscore-api.prediction-details-claim-updates.stage1.839]: PredictionScoreAPILogger-7c506940-987d-11ea-9ef4-0a5226a8e24f:16634746:Datarobot API Call: Output payload received from Datarobot API: {
  "prediction": "N",
  "predictionScore": 0.0000629713,
  "predictionExplanations": "lineItem : 0|feature: ADJER_CANNOT_COMPUTE_TWG_SUGGESTED_TIME_ZERO|Value: Y|strength: -1.4469371757,\nlineItem : 1|feature: ADJER_CANNOT_COMPUTE_TWG_SUGGESTED_PRICE|Value: Y|strength: -1.1968554807,\nlineItem : 2|feature: MONTHS_DIFF_CLAIM_REPAIR_FACILITY_FIRST_CLAIM|Value: 61|strength: -1.0681064444"
}

You could probably read the file as text and then parse it with regex.您可能可以将文件作为文本读取,然后使用正则表达式对其进行解析。 Something like this:像这样的东西:

import re

logfile = open(logfilepath, 'r')
log = logfile.read()
logfile.close()
objects = re.findall("(Output payload.*:\s?)(\{\s?[\s\S]+?\s?\})", log)

I have tested the regex for your given sample and it works fine.我已经为您给定的样本测试了正则表达式,它工作正常。 So this piece of code should work too.所以这段代码也应该可以工作。 Once you get all the JSON objects, you can easily find the one you are looking for.获得所有 JSON 对象后,您可以轻松找到所需的对象。

Happy hacking:)快乐的黑客攻击:)

Edit: Modified the regex according to the modified question.编辑:根据修改后的问题修改了正则表达式。 The regex now looks for an "Output payload" string.正则表达式现在查找“输出有效负载”字符串。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM