Pydoop mapreduce “AttributeError: module 'wordcount_minimal' has no attribute 'main'”

Question

I installed Pydoop and am trying to run MapReduce jobs. Just to do a dry run, I tried executing the word count examples wordcount_minimal.py and wordcount_full.py . Both of them hang at the map phase. In the end of the stderr , I find this message as per the script I run:

module 'wordcount_minimal' has no attribute ' main '

or

module 'wordcount_full' has no attribute ' main '

I executed the job using the command:

pydoop submit --upload-file-to-cache wordcount_full.py wordcount_full hdfs_input_dir hdfs_output_dir

Unable to find the reason behind this. Any idea what could be the reason?

I was able to execute the example from the pydoop script using the map and reduce functions and it completed successfully. But with the pydoop submit option, I have this issue. Not sure if I am missing something.

PS: I have a cluster with 2 nodes running Hortonworks HDP 2.6.5 . Pydoop is installed on both of them.

Answer 1

By default, pydoop submit expects an entry point called __main__ , but you can modify this via --entry-point . For instance, if your code is:

class Mapper ...
class Reducer ...
def run():
    pipes.run_task(pipes.Factory(Mapper, Reducer))

You can run it via pydoop submit --entry-point run ...

Pydoop mapreduce “AttributeError: module 'wordcount_minimal' has no attribute 'main'”

Question

1 answers

solution1
1 ACCPTED 2018-09-12 14:01:37

Pydoop mapreduce “AttributeError: module 'wordcount_minimal' has no attribute '__main__'”

Question

1 answers

solution1 1 ACCPTED 2018-09-12 14:01:37

Pydoop mapreduce “AttributeError: module 'wordcount_minimal' has no attribute 'main'”

solution1
1 ACCPTED 2018-09-12 14:01:37