awk（或sed / grep）获取子字符串的出现

Question

I have a json string in a bash variable, which is something like this: 我在bash变量中有一个json字符串，如下所示：

{
    "items": [
      {
        "foo": null,
        "timestamp": 1553703000,
        "bar": 123
      },
      {
        "foo": null,
        "timestamp": 1553703200,
        "bar": 456
      },
      {
        "foo": null,
        "timestamp": 1553703400,
        "bar": 789
      }
    ]
}

I want to know how many of those timestamp s are after a given datetime, so if I have 1553703100 it'll return 2 . 我想知道在给定日期timestamp之后有多少timestamp ，所以如果我有1553703100它将返回2 。

(Bonus imaginary points if you can get me just that number!) （如果你能给我这个数字的话，奖励虚构点！）

As a step towards that, I want to get just the matches of "timestamp": \\d+, in the string so that I can loop through them in a bash script. 为此，我只想在字符串中获取"timestamp": \\d+,的匹配项，以便可以在bash脚本中循环遍历它们。

I've used sed and grep a bit, but never used awk, and from my reading it seems like that might be the better match for the task. 我曾经使用过sed和grep，但是从未使用过awk，从我的阅读看来，这似乎可能是完成任务的最佳选择。

Other info: - The json is already pretty-printed, as above, so the timestamps would always be on separate lines. 其他信息：-JSON已如上打印，如上所述，因此时间戳始终在单独的行上。 - This is to run in Cygwin, so I have awk/gawk, sed, and grep/egrep, but probably not others. -这是要在Cygwin中运行，所以我有awk / gawk，sed和grep / egrep，但可能没有其他人。 - Could be any number of timestamps in the json. -可以是json中的任意数量的时间戳。

Answer 1

You didn't provide the expected output so it's a guess but is this what you're trying to do? 您没有提供预期的输出，因此只是一个猜测，但这是您要尝试的吗？

$ echo "$var" | jq '.items[].timestamp'
1553703000
1553703200
1553703400

or maybe: 或者可能：

$ echo "$var" | jq '.items[].timestamp | select(. > 1553703100)'
1553703200
1553703400

or: 要么：

$ echo "$var" | jq '[.items[].timestamp | select(. > 1553703100)] | length'
2

WARNING: I'm just learning jq so there may be better ways to do the above! 警告：我只是在学习jq所以可能会有更好的方法来完成上述操作！

Answer 2

edit: The second approach listed below has serious problems that were very helpfully outlined by @EdMorton. 编辑：下面列出的第二种方法具有严重的问题，@ EdMorton对此很有帮助。 I've elected to keep the old code for educational purposes. 我选择保留旧的代码用于教育目的。

Avoided substr() and caught null string i : 避免使用substr()并捕获空字符串i ：

$ awk -v dt=1553703100 '
  /timestamp/ && $2+0>dt {i++}
  END {print i+0}
' <<< "$var"

2

WARNING: PROBLEMATIC CODE 警告：问题代码

Here I used substr(string, index, [characters]) to trim the comma off your second field. 在这里，我使用substr(string, index, [characters])来将逗号修剪掉第二个字段。 The /timestamp/ regex is not complex; /timestamp/正则表达式并不复杂； it could be improved if your json became more intricate. 如果您的json变得更加复杂，则可以进行改进。

$ awk -v dt=1553703100 '
  /timestamp/ && substr($2, 0, length($2)) > dt {i++} 
  END {print i}
' <<< "$var"

2

Answer 3

You can also implement quickly a python solution: 您还可以快速实现python解决方案：

input : 输入：

$ cat data.json 
{
    "items": [
      {
        "foo": null,
        "timestamp": 1553703000,
        "bar": 123
      },
      {
        "foo": null,
        "timestamp": 1553703200,
        "bar": 456
      },
      {
        "foo": null,
        "timestamp": 1553703400,
        "bar": 789
      }
    ]
}

code : 代码：

$ cat extract_value2.py 
import json

tLimit = 1553703100
with open('data.json') as f:
    data = json.load(f)
    print([t['timestamp'] for t in data["items"] if t['timestamp'] > tLimit])

output : 输出：

$ python extract_value2.py 
[1553703200, 1553703400]

count code: 计数代码：

$ cat extract_value2.py 
import json

tLimit = 1553703100
with open('data.json') as f:
    data = json.load(f)
    print(len([t['timestamp'] for t in data["items"] if t['timestamp'] > tLimit]))

output : 输出：

$ python extract_value2.py
2

awk（或sed / grep）获取子字符串的出现

问题描述

3 个解决方案

解决方案1
4 2019-04-10 21:47:00

解决方案2
3 已采纳 2019-04-10 21:35:08

解决方案3
0 2019-04-11 00:18:24

awk（或sed / grep）获取子字符串的出现

问题描述

3 个解决方案

解决方案1 4 2019-04-10 21:47:00

解决方案2 3 已采纳 2019-04-10 21:35:08

解决方案3 0 2019-04-11 00:18:24

解决方案1
4 2019-04-10 21:47:00

解决方案2
3 已采纳 2019-04-10 21:35:08

解决方案3
0 2019-04-11 00:18:24