简体   繁体   English

用jq模糊匹配字符串

[英]Fuzzy match string with jq

Let's say I have some JSON in a file, it's a subset of JSON data extracted from a larger JSON file - that's why I'll use stream later in my attempted solution - and it looks like this:假设我在一个文件中有一些 JSON,它是从更大的 JSON 文件中提取的 JSON 数据的一个子集——这就是为什么我稍后会在我尝试的解决方案中使用stream它看起来像这样:

[
{"_id":"1","@":{},"article":false,"body":"Hello world","comments":"3","createdAt":"20201007200628","creator":{"id":"4a7ba8fd719d43598b977dd548eed6aa","bio":"","blocked":false,"followed":false,"human":false,"integration":false,"joined":"20201007200628","muted":false,"name":"mkscott","rss":false,"private":false,"username":"mkscott","verified":false,"verifiedComments":false,"badges":[],"score":"0","interactions":258,"state":1},"depth":"0","depthRaw":0,"hashtags":[],"id":"2d4126e342ed46509b55facb49b992a5","impressions":"3","links":[],"sensitive":false,"state":4,"upvotes":"0"},
{"_id":"2","@":{},"article":false,"body":"Goodbye world","comments":"3","createdAt":"20201007200628","creator":{"id":"4a7ba8fd719d43598b977dd548eed6aa","bio":"","blocked":false,"followed":false,"human":false,"integration":false,"joined":"20201007200628","muted":false,"name":"mkscott","rss":false,"private":false,"username":"mkscott","verified":false,"verifiedComments":false,"badges":[],"score":"0","interactions":258,"state":1},"depth":"0","depthRaw":0,"hashtags":[],"id":"2d4126e342ed46509b55facb49b992a5","impressions":"3","links":[],"sensitive":false,"state":4,"upvotes":"0"}
],
[
{"_id":"55","@":{},"article":false,"body":"Hello world","comments":"3","createdAt":"20201007200628","creator":{"id":"3a7ba8fd719d43598b977dd548eed6aa","bio":"","blocked":false,"followed":false,"human":false,"integration":false,"joined":"20201007200628","muted":false,"name":"mkscott","rss":false,"private":false,"username":"jkscott","verified":false,"verifiedComments":false,"badges":[],"score":"0","interactions":258,"state":1},"depth":"0","depthRaw":0,"hashtags":[],"id":"2d4126e342ed46509b55facb49b992a5","impressions":"3","links":[],"sensitive":false,"state":4,"upvotes":"0"},
{"_id":"56","@":{},"article":false,"body":"Goodbye world","comments":"3","createdAt":"20201007200628","creator":{"id":"3a7ba8fd719d43598b977dd548eed6aa","bio":"","blocked":false,"followed":false,"human":false,"integration":false,"joined":"20201007200628","muted":false,"name":"mkscott","rss":false,"private":false,"username":"jkscott","verified":false,"verifiedComments":false,"badges":[],"score":"0","interactions":258,"state":1},"depth":"0","depthRaw":0,"hashtags":[],"id":"2d4126e342ed46509b55facb49b992a5","impressions":"3","links":[],"sensitive":false,"state":4,"upvotes":"0"}
]

It describes 4 posts written by 2 different authors, with unique _id fields for each post.它描述了由 2 位不同作者撰写的 4 篇文章,每篇文章都有唯一的_id字段。 Both authors wrote 2 posts, where 1 says "Hello World" and the other says "Goodbye World".两位作者都写了 2 篇文章,其中一篇说“Hello World”,另一篇说“Goodbye World”。

I want to match on the word "Hello" and return the _id only for fields containing "Hello".我想匹配“Hello”这个词,并只为包含“Hello”的字段返回_id The expected result is:预期结果是:

1
55

The closest I could come in my attempt was:我尝试中最接近的是:

jq -nr --stream '
fromstream(1|truncate_stream(inputs))
| select(.body %like% "Hello")
| ._id
' <input_file

Assuming the input is modified slightly to make it a stream of the arrays as shown in the Q:假设输入稍作修改,使其成为 arrays 的 stream,如 Q 所示:

jq -nr --stream '
  fromstream(1|truncate_stream(inputs))
  | select(.body | test("Hello"))
  | ._id
'

produces the desired output.产生所需的 output。

test uses regex matching. test使用正则表达式匹配。 In your case, it seems you could use simple substring matching instead.在您的情况下,您似乎可以改用简单的 substring 匹配。

Handling extraneous commas处理多余的逗号

Assuming the input has commas between a stream of valid JSON exactly as shown, you could presumably use sed to remove them first.假设输入在有效 JSON 的 stream 之间有逗号,完全如图所示,您大概可以使用sed先删除它们。

Or, if you want an only-jq solution, use the following in conjunction with the -n, -r and --stream command-line options:或者,如果您想要一个只有 jq 的解决方案,请将以下内容与 -n、-r 和 --stream 命令行选项结合使用:

def iterate:
  fromstream(1|truncate_stream(inputs?))
  | select(.body | test("Hello"))
  | ._id,
    iterate;


iterate

(Notice the "?".) (注意“?”。)

The streaming parser (invoked with --stream) is usually not needed for the kind of task you describe, so in this response, I'm going to assume that the following (or a variant thereof) will suffice:您描述的那种任务通常不需要流式解析器(使用 --stream 调用),因此在此响应中,我将假设以下内容(或其变体)就足够了:

.[]
| select( .body | test("Hello") )._id

This of course assumes that the input is valid JSON.这当然假设输入是有效的 JSON。

Handling comma-delimited JSON处理逗号分隔 JSON

If your input is a comma-delimited stream of JSON as shown in the Q, you could use the following in conjunction with the -n command-line option:如果您的输入是 JSON 中的逗号分隔的 stream,如 Q 中所示,您可以将以下内容与 -n 命令行选项结合使用:

# This is a variant of the built-in `recurse/1`:
def iterate(f): def r: f | (., r); r;

iterate( inputs? | .[] | select( .body | test("Hello") )._id )

Please note that this assumes that whatever occurs on a line after a delimiting comma can be ignored.请注意,这是假设在定界逗号之后一行中出现的任何内容都可以忽略。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM