简体   繁体   中英

How to pipe multi-line JSON Objects into separate python invocations

I know the basics of piping stdin to downstream processes in the shell and as long as each line is treated individually, or as one single input, I can get my pipelines to work.

But when I want to read 4 lines of stdin, do some processing, read 6 more lines, and do the same, my limited of understanding of pipelines becomes an issue.

For example, in the below pipeline, each curl invocation produces an unknown number of lines of output that constitute one JSONObject:

cat geocodes.txt \
  | xargs  -I% -n 1 curl -s 'http://maps.googleapis.com/maps/api/geocode/json?latlng='%'&sensor=true' \
  | python -c "import json,sys;obj=json.load(sys.stdin);print obj['results'][0]['address_components'][3]['short_name'];"

How can I consume exactly one JSONObject per python invocation? Note I actually have negligible experience in Python. I actually have more experience with Node.js (would it be better to use Node.js to process the JSON curl output?)

Geocodes.txt would be something like:

51.5035705555556,-3.15153263888889
51.5035400277778,-3.15153477777778
51.5035285833333,-3.15150258333333
51.5033861111111,-3.15140833333333
51.5034980555556,-3.15146016666667
51.5035285833333,-3.15155505555556
51.5035362222222,-3.15156338888889
51.5035362222222,-3.15156338888889

EDIT I have a nasty feeling that the answer is that you need to read line by line and check whether you have a complete object before parsing. Is there a function which will do the hard work for me?

I believe this approach would accomplish what you want. First, save your python script in a file, my_script.py for example. Then do the following:

cat geocodes.txt \
  | xargs  -I% sh -c "curl -s 'http://maps.googleapis.com/maps/api/geocode/json?latlng='%'&sensor=true' | python my_script.py"

Where my_script.py is:

import json,sys;obj=json.load(sys.stdin);print obj['results'][0]['address_components'][3]['short_name'];

Output:

Cardiff
Cardiff
Cardiff
Cardiff
Cardiff
Cardiff
Cardiff
Cardiff

Seems a bit hacky, I'll admit.


ORIGINAL ANSWER

I am no bash wizard, so my instinct is to simply do everything in Python. The following script would illustrate that approach in Python 3:

import urllib.request as request
import urllib.parse as parse
import json

serviceurl = "http://maps.googleapis.com/maps/api/geocode/json?"

with open("geocodes.txt") as f:
    for line in f:
        url = (serviceurl +
               parse.urlencode({'latlng':line, 'sensor':'true'}))
        with request.urlopen(url) as response:
            bytes_data = response.read()
        obj = json.loads(bytes_data.decode('utf-8'))
        print(obj['results'][0]['address_components'][3]['short_name'])

Output:

Cardiff
Cardiff
Cardiff
Cardiff
Cardiff
Cardiff
Cardiff
Cardiff

Have a look at:

http://trentm.com/json/#FEATURE-Grouping

Grouping can be helpful for "one JSON object per line" formats or for things such as:

$ cat *.json | json -g ...

To install:

sudo npm install -g json

I haven't tried this myself so can't verify it works, but it might be that missing link to do what you want (Group JSON)

You don't need python or node.js. jq is designed specifically for json filtering UNIX style:

sudo apt-get install jq

Then:

cat geocodes.txt  \
  | xargs  -I% curl -s 'http://maps.googleapis.com/maps/api/geocode/json?latlng='%'&sensor=true'  \
  | jq --unbuffered '.results[0].formatted_address'

Or, if you want to do this on all your JPG files:

find -iname "**jpg" \
  | xargs -n 1 -d'\n' exiftool -q -n -p '$GPSLatitude,$GPSLongitude' 
  | xargs  -I% curl -s 'http://maps.googleapis.com/maps/api/geocode/json?latlng='%'&sensor=true'  
  | jq --unbuffered  '.results[0].formatted_address'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM