简体   繁体   中英

How to run apache beam locally?

I'm trying to run a python apache beam script on my local machine to do some simulation. I have put 'DirectRunner' in my options. However the p.run() gives me an error "TypeError: Receiver() takes no arguments"

Any ideas why this would occur ? I'm using Spyder as IDE.

EDIT: Here is an example of code, it takes a list of messages in the form of:

{ "Val_1": 1, "Val_2": 56, "date": "2019-04-01T15:00:04.340778" }

split it and put it in form of

(1, 56, 2019-04-01T15:00:04.340778)

then save it to to a text file.

p = beam.Pipeline('DirectRunner')
(p | 'ReadMessage' >>  beam.io.textio.ReadFromTextWithFilename('input/inputs.json')
                    | 'Processing' >> beam.ParDo(Split())
                    | 'Write' >> beam.io.WriteToText('input/results.txt'))
p.run().wait_until_finish() 

Error:

"TypeError: Receiver() takes no arguments"

You do not need to specify 'DirectRunner' as an argument, if you do not specify any runner ie leave it blank, it defaults to running using the DirectRunner. This should run fine.

    p = beam.Pipeline()
    (p | 'ReadMessage' >>  beam.io.textio.ReadFromTextWithFilename('input/inputs.json')
                        | 'Processing' >> beam.ParDo(Split())
                        | 'Write' >> beam.io.WriteToText('input/results.txt'))
    result = p.run()
    result.wait_until_finish()

if __name__ == "__main__":
    run()

You execute your Python Beam file just like any regular file, assuming you specified your Pipeline as a DirectRunner, which you did with

p = beam.Pipeline('DirectRunner')

Apache Beam currently has limited support for Python 3.x . If you try to run the word count example , it will yield the same error. It will be fixed in the future, as they are currently working on the full support for Python 3.

When you want to deploy your Python Beam code with the Google Cloud Platform, I highly recommend to switch to Python 2.7.

You can track issues here

However, I cannot say what exactly your Split function does, so I provide you with a minimal working example, so that you can test your Beam installation.

import apache_beam as beam
import ast

# The DoFn to perform on each element in the input PCollection.
class Split(beam.DoFn):
    def process(self, element):
        val = ast.literal_eval(element[1])
        output ='('+','.join(map(str, val.values())) + ')'
        return [output]

def run():
    p = beam.Pipeline('DirectRunner')
    (p | 'ReadMessage' >>  beam.io.textio.ReadFromTextWithFilename('input/inputs.json')
                        | 'Processing' >> beam.ParDo(Split())
                        | 'Write' >> beam.io.WriteToText('input/results.txt'))
    result = p.run()
    result.wait_until_finish()

if __name__ == "__main__":
    run()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM