简体   繁体   中英

passing JSON argument as a string to python hadoop streaming application

I want to pass a JSON string as command line argument to my reducer.py file but I'm unable to do so.

Command I execute is:

hadoop jar contrib/streaming/hadoop-streaming.jar -file /home/hadoop/mapper.py -mapper 'mapper.py' -file /home/hadoop/reducer.py -reducer 'reducer.py {"abc":"123"}' -input /user/abc.txt -output /user/output/

When I print argv array in reducer.py, it shows output as:

['/mnt/var/lib/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1423459215008_0057/container_1423459215008_0057_01_000004/./reducer.py', '{', 'abc', ':', '123', '}']

First argument is the path of reducer.py but my second argument gets split by double quotes.

I want to achieve second argument as a complete JSON string. For example like: ['/mnt/var/lib/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1423459215008_0057/container_1423459215008_0057_01_000004/./reducer.py','{"abc":"123"}']

So that I can load that argument as Json in reducer.py

Any help is appreciated. Thanks !

EDIT: Tried escaping JSON using command:

hadoop jar contrib/streaming/hadoop-streaming.jar -file /home/hadoop/mapper.py -mapper 'mapper.py' -file /home/hadoop/reducer.py -reducer 'reducer.py "{\\"abc\\":\\"123\\"}"' -input /user/abc.txt -output /user/output/

Gives output as:

['/mnt/var/lib/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1423459215008_0058/container_1423459215008_0058_01_000004/./redu.py', '{\\\\', 'abc\\\\', ':\\\\', '123\\\\', '}']

You need to put your json inside double quotes with proper escaping: "{\\"abc\\":\\"123\\"}" but chances are that your input will be processed Hadoop before being passed to your script.

If this doesn't work you can try passing your arguments via environment with -cmdenv name=value . See How do I pass a parameter to a python Hadoop streaming job? for more details.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM