简体   繁体   English

awk(或sed / grep)获取子字符串的出现

[英]awk (or sed/grep) to get occurrences of substring

I have a json string in a bash variable, which is something like this: 我在bash变量中有一个json字符串,如下所示:

{
    "items": [
      {
        "foo": null,
        "timestamp": 1553703000,
        "bar": 123
      },
      {
        "foo": null,
        "timestamp": 1553703200,
        "bar": 456
      },
      {
        "foo": null,
        "timestamp": 1553703400,
        "bar": 789
      }
    ]
}

I want to know how many of those timestamp s are after a given datetime, so if I have 1553703100 it'll return 2 . 我想知道在给定日期timestamp之后有多少timestamp ,所以如果我有1553703100它将返回2

(Bonus imaginary points if you can get me just that number!) (如果你能给我这个数字的话,奖励虚构点!)

As a step towards that, I want to get just the matches of "timestamp": \\d+, in the string so that I can loop through them in a bash script. 为此,我只想在字符串中获取"timestamp": \\d+,的匹配项,以便可以在bash脚本中循环遍历它们。

I've used sed and grep a bit, but never used awk, and from my reading it seems like that might be the better match for the task. 我曾经使用过sed和grep,但是从未使用过awk,从我的阅读看来,这似乎可能是完成任务的最佳选择。

Other info: - The json is already pretty-printed, as above, so the timestamps would always be on separate lines. 其他信息:-JSON已如上打印,如上所述,因此时间戳始终在单独的行上。 - This is to run in Cygwin, so I have awk/gawk, sed, and grep/egrep, but probably not others. -这是要在Cygwin中运行,所以我有awk / gawk,sed和grep / egrep,但可能没有其他人。 - Could be any number of timestamps in the json. -可以是json中的任意数量的时间戳。

You didn't provide the expected output so it's a guess but is this what you're trying to do? 您没有提供预期的输出,因此只是一个猜测,但这是您要尝试的吗?

$ echo "$var" | jq '.items[].timestamp'
1553703000
1553703200
1553703400

or maybe: 或者可能:

$ echo "$var" | jq '.items[].timestamp | select(. > 1553703100)'
1553703200
1553703400

or: 要么:

$ echo "$var" | jq '[.items[].timestamp | select(. > 1553703100)] | length'
2

WARNING: I'm just learning jq so there may be better ways to do the above! 警告:我只是在学习jq所以可能会有更好的方法来完成上述操作!

edit: The second approach listed below has serious problems that were very helpfully outlined by @EdMorton. 编辑:下面列出的第二种方法具有严重的问题,@ EdMorton对此很有帮助。 I've elected to keep the old code for educational purposes. 我选择保留旧的代码用于教育目的。

Avoided substr() and caught null string i : 避免使用substr()并捕获空字符串i

$ awk -v dt=1553703100 '
  /timestamp/ && $2+0>dt {i++}
  END {print i+0}
' <<< "$var"

2

WARNING: PROBLEMATIC CODE 警告:问题代码

Here I used substr(string, index, [characters]) to trim the comma off your second field. 在这里,我使用substr(string, index, [characters])来将逗号修剪掉第二个字段。 The /timestamp/ regex is not complex; /timestamp/正则表达式并不复杂; it could be improved if your json became more intricate. 如果您的json变得更加复杂,则可以进行改进。

$ awk -v dt=1553703100 '
  /timestamp/ && substr($2, 0, length($2)) > dt {i++} 
  END {print i}
' <<< "$var"

2

You can also implement quickly a python solution: 您还可以快速实现python解决方案:

input : 输入

$ cat data.json 
{
    "items": [
      {
        "foo": null,
        "timestamp": 1553703000,
        "bar": 123
      },
      {
        "foo": null,
        "timestamp": 1553703200,
        "bar": 456
      },
      {
        "foo": null,
        "timestamp": 1553703400,
        "bar": 789
      }
    ]
}

code : 代码

$ cat extract_value2.py 
import json

tLimit = 1553703100
with open('data.json') as f:
    data = json.load(f)
    print([t['timestamp'] for t in data["items"] if t['timestamp'] > tLimit])

output : 输出

$ python extract_value2.py 
[1553703200, 1553703400]

count code: 计数代码:

$ cat extract_value2.py 
import json

tLimit = 1553703100
with open('data.json') as f:
    data = json.load(f)
    print(len([t['timestamp'] for t in data["items"] if t['timestamp'] > tLimit]))

output : 输出

$ python extract_value2.py
2 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM