简体   繁体   中英

Select entries based on multiple values in jq

I'm working with JQ and I absolutely love it so far. I'm running into an issue I've yet to find a solution to anywhere else, though, and wanted to see if the community had a way to do this.

Let's presume we have a JSON file that looks like so:

{"author": "Gary", "text": "Blah"}
{"author": "Larry", "text": "More Blah"}
{"author": "Jerry", "text": "Yet more Blah"}
{"author": "Barry", "text": "Even more Blah"}
{"author": "Teri", "text": "Text on text on text"}
{"author": "Bob", "text": "Another thing to say"}

Now, we want to select rows where the value of author is equal to either "Gary" OR "Larry", but no other case. In reality, I have several thousand names I'm checking against, so simply stating the direct or conditional (eg cat blah.json | jq -r 'select(.author == "Gary" or .author == "Larry")' ) isn't sufficient. I'm trying to do this via the inside function like so but get an error dialog:

cat blah.json | jq -r 'select(.author | inside(["Gary", "Larry"]))'
jq: error (at <stdin>:1): array (["Gary","La...) and string ("Gary") cannot have their containment checked

What would be the best method for doing something like this?

IRC user gnomon answered this on the jq channel as follows:

jq 'select([.author] | inside(["Larry", "Garry", "Jerry"]))'

The intuition behind this approach, as stated by the user was: "Literally your idea, only wrapping .author as [.author] to coerce it into being a single-item array so inside() will work on it." This answer produces the desired result of filtering for a series of names provided in a list as the original question desired.

inside and contains are a bit weird. Here are some more straightforward solutions:

index/1

select( .author as $a | ["Gary", "Larry"] | index($a) )

any/2

["Gary", "Larry"] as $whitelist
| select( .author as $a | any( $whitelist[]; . == $a) )

Using a dictionary

If performance is an issue and if "author" is always a string, then a solution along the lines suggested by @JeffMercado should be considered. Here is a variant (to be used with the -n command-line option):

["Gary", "Larry"] as $whitelist
| ($whitelist | map( {(.): true} ) | add) as $dictionary
| inputs
| select($dictionary[.author])

You can use objects as if they're sets to test for membership. Methods operating on arrays will be inefficient, especially if the array may be huge.

You can build up a set of values prior to reading your input, then use the set to filter your inputs.

$ jq -n --argjson names '["Larry","Garry","Jerry"]' '
(reduce $names[] as $name ({}; .[$name] = true)) as $set
    | inputs | select($set[.author])
' blah.json

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM