I created a Python mapper that I run as a Hadoop streaming MapReduce job. It validates the input and writes a message to output if the input is invalid.
...
# input from STDIN
for line in sys.stdin:
indata = json.loads(line)
try:
jsonschema.validate(indata,schema)
except jsonschema.ValidationError, error:
# validation against schema failed
print error.message
except:
# other exceptions
raise
My question: The mapper writes the message for invalid input as expected, but it also creates empty "part-0000x" files for valid input.
I would like to omit the empty output files. How can I achieve this?
To omit the empty output files use LazyOutputFormat
class. It generates part files only when at least one record is generated for the particular file.
But LazyOutputFormat
is in Java API, you find the corresponding API for Python
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.