简体   繁体   English

如何将多行JSON对象管道传输到单独的python调用中

[英]How to pipe multi-line JSON Objects into separate python invocations

I know the basics of piping stdin to downstream processes in the shell and as long as each line is treated individually, or as one single input, I can get my pipelines to work. 我知道将stdin传递到Shell中的下游进程的基本知识,只要将每一行单独处理或作为一个输入,就可以使我的管道正常工作。

But when I want to read 4 lines of stdin, do some processing, read 6 more lines, and do the same, my limited of understanding of pipelines becomes an issue. 但是,当我想读取4行标准输入,进行一些处理,再读取6行并执行相同的操作时,我对管道的有限理解就成为一个问题。

For example, in the below pipeline, each curl invocation produces an unknown number of lines of output that constitute one JSONObject: 例如,在下面的管道中,每个curl调用都会产生未知数量的构成一个JSONObject的输出行:

cat geocodes.txt \
  | xargs  -I% -n 1 curl -s 'http://maps.googleapis.com/maps/api/geocode/json?latlng='%'&sensor=true' \
  | python -c "import json,sys;obj=json.load(sys.stdin);print obj['results'][0]['address_components'][3]['short_name'];"

How can I consume exactly one JSONObject per python invocation? 如何在每次python调用中仅消耗一个JSONObject? Note I actually have negligible experience in Python. 注意我实际上在Python方面的经验微不足道。 I actually have more experience with Node.js (would it be better to use Node.js to process the JSON curl output?) 我实际上对Node.js有更多的经验(使用Node.js处理JSON curl输出会更好吗?)

Geocodes.txt would be something like: Geocodes.txt类似于:

51.5035705555556,-3.15153263888889
51.5035400277778,-3.15153477777778
51.5035285833333,-3.15150258333333
51.5033861111111,-3.15140833333333
51.5034980555556,-3.15146016666667
51.5035285833333,-3.15155505555556
51.5035362222222,-3.15156338888889
51.5035362222222,-3.15156338888889

EDIT I have a nasty feeling that the answer is that you need to read line by line and check whether you have a complete object before parsing. 编辑我有一个讨厌的感觉,答案是您需要逐行阅读并在解析之前检查您是否有完整的对象。 Is there a function which will do the hard work for me? 有没有可以为我完成工作的功能?

I believe this approach would accomplish what you want. 我相信这种方法可以实现您想要的。 First, save your python script in a file, my_script.py for example. 首先,将您的python脚本保存在一个文件中,例如my_script.py Then do the following: 然后执行以下操作:

cat geocodes.txt \
  | xargs  -I% sh -c "curl -s 'http://maps.googleapis.com/maps/api/geocode/json?latlng='%'&sensor=true' | python my_script.py"

Where my_script.py is: 其中my_script.py是:

import json,sys;obj=json.load(sys.stdin);print obj['results'][0]['address_components'][3]['short_name'];

Output: 输出:

Cardiff
Cardiff
Cardiff
Cardiff
Cardiff
Cardiff
Cardiff
Cardiff

Seems a bit hacky, I'll admit. 我承认,这似乎有些怪异。


ORIGINAL ANSWER 原始答案

I am no bash wizard, so my instinct is to simply do everything in Python. 我不是bash向导,所以我的直觉是只用Python做所有事情。 The following script would illustrate that approach in Python 3: 以下脚本将说明Python 3中的这种方法:

import urllib.request as request
import urllib.parse as parse
import json

serviceurl = "http://maps.googleapis.com/maps/api/geocode/json?"

with open("geocodes.txt") as f:
    for line in f:
        url = (serviceurl +
               parse.urlencode({'latlng':line, 'sensor':'true'}))
        with request.urlopen(url) as response:
            bytes_data = response.read()
        obj = json.loads(bytes_data.decode('utf-8'))
        print(obj['results'][0]['address_components'][3]['short_name'])

Output: 输出:

Cardiff
Cardiff
Cardiff
Cardiff
Cardiff
Cardiff
Cardiff
Cardiff

Have a look at: 看一下:

http://trentm.com/json/#FEATURE-Grouping http://trentm.com/json/#FEATURE-Grouping

Grouping can be helpful for "one JSON object per line" formats or for things such as:

$ cat *.json | json -g ...

To install: 安装:

sudo npm install -g json

I haven't tried this myself so can't verify it works, but it might be that missing link to do what you want (Group JSON) 我自己没有尝试过,因此无法验证它是否有效,但是可能是缺少链接以执行您想要的操作(组JSON)

You don't need python or node.js. 您不需要python或node.js。 jq is designed specifically for json filtering UNIX style: jq专为JSON过滤UNIX风格而设计:

sudo apt-get install jq

Then: 然后:

cat geocodes.txt  \
  | xargs  -I% curl -s 'http://maps.googleapis.com/maps/api/geocode/json?latlng='%'&sensor=true'  \
  | jq --unbuffered '.results[0].formatted_address'

Or, if you want to do this on all your JPG files: 或者,如果您想对所有JPG文件执行此操作:

find -iname "**jpg" \
  | xargs -n 1 -d'\n' exiftool -q -n -p '$GPSLatitude,$GPSLongitude' 
  | xargs  -I% curl -s 'http://maps.googleapis.com/maps/api/geocode/json?latlng='%'&sensor=true'  
  | jq --unbuffered  '.results[0].formatted_address'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM