[英]Extract and reformat text between string/time keys
I am having problem with extracting text between two strings. 我在提取两个字符串之间的文本时遇到问题。
I have log file like this (example data): 我有这样的日志文件(示例数据):
2018-12-31 09:49:24 addData [data=data]</br>
2018-12-31 09:49:25 publishData .......................
2018-12-31 09:49:26 createDoc [xml=
<mail>
<recipent>doctor who</recipent>
</mail>]
<attempt>1</attempt>]
2018-12-31 09:49:26 createDoc [xml=
<clientHash>hash</clientHash>
<content>context</content>]
2018-12-31 09:51:27 exampleService [count=1]
My code: perl -ne 'print if (/09:40/ .. /09:50/)' server.log | sed -n '/createDoc/,/]/p'
我的代码:
perl -ne 'print if (/09:40/ .. /09:50/)' server.log | sed -n '/createDoc/,/]/p'
perl -ne 'print if (/09:40/ .. /09:50/)' server.log | sed -n '/createDoc/,/]/p'
My output is: 我的输出是:
2018-12-31 09:49:26 createDoc [xml=<mail><recipent>doctor who</recipent>
</mail>]
<attempt>1</attempt>]
2018-12-31 09:49:26 createDoc [xml=
<clientHash>hash</clientHash>
<content>context</content>]
but I want to have only xml like this: 但我只想要这样的xml:
<element>
<mail><recipent>doctor who</recipent>
</mail>
<attempt>1</attempt>
</element>
<element>
<mail><recipent>doctor who</recipent>
</mail>
<clientHash>hash</clientHash>
<content>context</content>
</element>
I would use Awk for this. 我会为此使用Awk。 If you have GNU Awk, you can even parse the time stamps easily.
如果您有GNU Awk,甚至可以轻松解析时间戳。
awk -v start=$(date -d "09:40" +%s) \
-v end=$(date -d "09:50" +%s) '
/^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2} / {
if ($0 ~ / createDoc \[xml=/) {
split($1, ymd, /-/)
split($2, hms, /:/)
when = mktime(ymd[1] " " ymd[2] " " ymd[3] " " hms[1] " " hms[2] " " hms[3])
p = (when >= start && when <= end)
if (p) $0 = substr($0, 36)
}
else p = 0
}
p { sub(/\]$/, ""); print }' file.log
This is somewhat Linux-centric -- in addition to GNU Awk (for the mktime function) the date
syntax is specific to GNU date
. 这在某种程度上以Linux为中心-除了GNU Awk(用于mktime函数)之外,
date
语法还特定于GNU date
。 (On OSX try date -j %H:%M:%S 09:40:00 +%s
.) (在OSX上,尝试
date -j %H:%M:%S 09:40:00 +%s
。)
Let's say we have tmp.log which something like below. 假设我们有tmp.log ,如下所示。
2018-12-31 09:49:24 addData [data=data]</br>
2018-12-31 09:49:25 publishData .......................
2018-12-31 09:49:26 createDoc [xml=<mail><recipent>doctor who</recipent></mail>]<attempt>1</attempt>]
2018-12-31 09:49:26 createDoc [xml=<clientHash>hash</clientHash><content>context</content>]
2018-12-31 09:51:27 exampleService [count=1]
We can combine some basic commands to get desired output. 我们可以结合一些基本命令以获得所需的输出。
cat tmp.log | grep xml | awk 'BEGIN { FS = "[" } ; { print $2 }'
This will produce something like that: 这将产生如下内容:
xml=<mail><recipent>doctor who</recipent></mail><attempt>1</attempt>]
xml=<clientHash>hash</clientHash><content>context</content>]
If you also want to get rid of from last character which is ' ] '. 如果您也想摆脱最后一个字符' ] '。 Add one more awk too.
也增加一个awk。
cat tmp.log | grep xml | awk 'BEGIN { FS = "[" } ; { print $2 }'| awk 'BEGIN { FS = "]" } ; { print $1 }'
I know it's not the coolest way to do that, at least it's easy to understand and works. 我知道这不是最酷的方法,至少很容易理解和使用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.