[英]awk (or sed/grep) to get occurrences of substring
I have a json string in a bash variable, which is something like this: 我在bash变量中有一个json字符串,如下所示:
{
"items": [
{
"foo": null,
"timestamp": 1553703000,
"bar": 123
},
{
"foo": null,
"timestamp": 1553703200,
"bar": 456
},
{
"foo": null,
"timestamp": 1553703400,
"bar": 789
}
]
}
I want to know how many of those timestamp
s are after a given datetime, so if I have 1553703100
it'll return 2
. 我想知道在给定日期
timestamp
之后有多少timestamp
,所以如果我有1553703100
它将返回2
。
(Bonus imaginary points if you can get me just that number!) (如果你能给我这个数字的话,奖励虚构点!)
As a step towards that, I want to get just the matches of "timestamp": \\d+,
in the string so that I can loop through them in a bash script. 为此,我只想在字符串中获取
"timestamp": \\d+,
的匹配项,以便可以在bash脚本中循环遍历它们。
I've used sed and grep a bit, but never used awk, and from my reading it seems like that might be the better match for the task. 我曾经使用过sed和grep,但是从未使用过awk,从我的阅读看来,这似乎可能是完成任务的最佳选择。
Other info: - The json is already pretty-printed, as above, so the timestamps would always be on separate lines. 其他信息:-JSON已如上打印,如上所述,因此时间戳始终在单独的行上。 - This is to run in Cygwin, so I have awk/gawk, sed, and grep/egrep, but probably not others.
-这是要在Cygwin中运行,所以我有awk / gawk,sed和grep / egrep,但可能没有其他人。 - Could be any number of timestamps in the json.
-可以是json中的任意数量的时间戳。
You didn't provide the expected output so it's a guess but is this what you're trying to do? 您没有提供预期的输出,因此只是一个猜测,但这是您要尝试的吗?
$ echo "$var" | jq '.items[].timestamp'
1553703000
1553703200
1553703400
or maybe: 或者可能:
$ echo "$var" | jq '.items[].timestamp | select(. > 1553703100)'
1553703200
1553703400
or: 要么:
$ echo "$var" | jq '[.items[].timestamp | select(. > 1553703100)] | length'
2
WARNING: I'm just learning jq
so there may be better ways to do the above! 警告:我只是在学习
jq
所以可能会有更好的方法来完成上述操作!
edit: The second approach listed below has serious problems that were very helpfully outlined by @EdMorton. 编辑:下面列出的第二种方法具有严重的问题,@ EdMorton对此很有帮助。 I've elected to keep the old code for educational purposes.
我选择保留旧的代码用于教育目的。
Avoided substr()
and caught null string i
: 避免使用
substr()
并捕获空字符串i
:
$ awk -v dt=1553703100 '
/timestamp/ && $2+0>dt {i++}
END {print i+0}
' <<< "$var"
2
WARNING: PROBLEMATIC CODE 警告:问题代码
Here I used substr(string, index, [characters])
to trim the comma off your second field. 在这里,我使用
substr(string, index, [characters])
来将逗号修剪掉第二个字段。 The /timestamp/
regex is not complex; /timestamp/
正则表达式并不复杂; it could be improved if your json became more intricate. 如果您的json变得更加复杂,则可以进行改进。
$ awk -v dt=1553703100 '
/timestamp/ && substr($2, 0, length($2)) > dt {i++}
END {print i}
' <<< "$var"
2
You can also implement quickly a python
solution: 您还可以快速实现
python
解决方案:
input : 输入 :
$ cat data.json
{
"items": [
{
"foo": null,
"timestamp": 1553703000,
"bar": 123
},
{
"foo": null,
"timestamp": 1553703200,
"bar": 456
},
{
"foo": null,
"timestamp": 1553703400,
"bar": 789
}
]
}
code : 代码 :
$ cat extract_value2.py
import json
tLimit = 1553703100
with open('data.json') as f:
data = json.load(f)
print([t['timestamp'] for t in data["items"] if t['timestamp'] > tLimit])
output : 输出 :
$ python extract_value2.py
[1553703200, 1553703400]
count code: 计数代码:
$ cat extract_value2.py
import json
tLimit = 1553703100
with open('data.json') as f:
data = json.load(f)
print(len([t['timestamp'] for t in data["items"] if t['timestamp'] > tLimit]))
output : 输出 :
$ python extract_value2.py
2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.