I want to pass a JSON string as command line argument to my reducer.py file but I'm unable to do so.
Command I execute is:
hadoop jar contrib/streaming/hadoop-streaming.jar -file /home/hadoop/mapper.py -mapper 'mapper.py' -file /home/hadoop/reducer.py -reducer 'reducer.py {"abc":"123"}' -input /user/abc.txt -output /user/output/
When I print argv array in reducer.py, it shows output as:
['/mnt/var/lib/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1423459215008_0057/container_1423459215008_0057_01_000004/./reducer.py', '{', 'abc', ':', '123', '}']
First argument is the path of reducer.py but my second argument gets split by double quotes.
I want to achieve second argument as a complete JSON string. For example like: ['/mnt/var/lib/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1423459215008_0057/container_1423459215008_0057_01_000004/./reducer.py','{"abc":"123"}']
So that I can load that argument as Json in reducer.py
Any help is appreciated. Thanks !
EDIT: Tried escaping JSON using command:
hadoop jar contrib/streaming/hadoop-streaming.jar -file /home/hadoop/mapper.py -mapper 'mapper.py' -file /home/hadoop/reducer.py -reducer 'reducer.py "{\\"abc\\":\\"123\\"}"' -input /user/abc.txt -output /user/output/
Gives output as:
['/mnt/var/lib/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1423459215008_0058/container_1423459215008_0058_01_000004/./redu.py', '{\\\\', 'abc\\\\', ':\\\\', '123\\\\', '}']
You need to put your json inside double quotes with proper escaping: "{\\"abc\\":\\"123\\"}"
but chances are that your input will be processed Hadoop before being passed to your script.
If this doesn't work you can try passing your arguments via environment with -cmdenv name=value
. See How do I pass a parameter to a python Hadoop streaming job? for more details.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.