简体   繁体   中英

ImportError: No module named main in GAE Flexible

I am using apache beam for python(python version 2.7) and I always get the error: ImportError: No module named main when I upload my code to Google App Engine Flexible. I can see this error in my dataflow console when I call my endpoint /server . When I execute my code locally, it works perfectly in my gcloud dataflow, but when I executed it in GAE Flex I get the error specified above.

This is my code:

import apache_beam as beam
import logging
logging.basicConfig(level=logging.DEBUG)

from flask import Flask

from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.options.pipeline_options import StandardOptions, SetupOptions
from apache_beam.options.pipeline_options import GoogleCloudOptions

from apache_beam.io import WriteToText
from apache_beam.io import ReadFromText


PROJECT_ID = 'PROJECT_ID'
JOB_NAME = 'test-job-name-l'
BUCKET_URL = 'gs://backup-bucket'
app = Flask(__name__)

@app.route('/')
def start():
    return "Welcome to datamigration"

@app.route('/server')
def start1():
    run()
    return "It works"

class FindWords(beam.DoFn):
    def process(self, element):
        import re as regex
        return regex.findall(r"[A-Za-z\']+", element)

class CountWordsTransform(beam.PTransform):
    def expand(self, p_collection):
        return (p_collection
                | "Split" >> (beam.ParDo(FindWords()).with_input_types(unicode))
                | "PairWithOne" >> beam.Map(lambda word: (word, 1))
                | "GroupBy" >> beam.GroupByKey()
                | "AggregateGroups" >> beam.Map(lambda (word, ones): (word, sum(ones))))

def run():
    pipeline_options = PipelineOptions()
    pipeline_options.view_as(SetupOptions).save_main_session = True
    pipeline_options.view_as(
        SetupOptions).requirements_file = "requirements.txt"
    google_cloud_options = pipeline_options.view_as(GoogleCloudOptions)
    google_cloud_options.project = PROJECT_ID
    google_cloud_options.job_name = JOB_NAME
    google_cloud_options.staging_location = BUCKET_URL + '/staging'
    google_cloud_options.temp_location = BUCKET_URL + '/temp'
    pipeline_options.view_as(StandardOptions).runner = 'DataflowRunner'
    pipeline = beam.Pipeline(options=pipeline_options)

    (pipeline
     | "Load" >> ReadFromText(BUCKET_URL + "/file.txt")
     | "Count Words" >> CountWordsTransform()
     | "Save" >> WriteToText(BUCKET_URL + '/result/test')
     )

    pipeline.run()

if __name__ == '__main__':
    app.run(port=8080, debug=True)

And this is the full error the I always get:

Error:
Dataflow pipeline failed. State: FAILED, Error:
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 642, in do_work
    work_executor.execute()
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 156, in execute
    op.start()
  File "apache_beam/runners/worker/operations.py", line 351, in apache_beam.runners.worker.operations.DoOperation.start
    def start(self):
  File "apache_beam/runners/worker/operations.py", line 352, in apache_beam.runners.worker.operations.DoOperation.start
    with self.scoped_start_state:
  File "apache_beam/runners/worker/operations.py", line 357, in apache_beam.runners.worker.operations.DoOperation.start
    pickler.loads(self.spec.serialized_fn))
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py", line 232, in loads
    return dill.loads(s)
  File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 277, in loads
    return load(file)
  File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 266, in load
    obj = pik.load()
  File "/usr/lib/python2.7/pickle.py", line 864, in load
    dispatch[key](self)
  File "/usr/lib/python2.7/pickle.py", line 1096, in load_global
    klass = self.find_class(module, name)
  File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 423, in find_class
    return StockUnpickler.find_class(self, module, name)
  File "/usr/lib/python2.7/pickle.py", line 1130, in find_class
    __import__(module)
ImportError: No module named main

My app.yaml:

runtime: python
env: flex
service: ms-somename
threadsafe: true

entrypoint: gunicorn -b :$PORT main:app

runtime_config:
  python_version: 2

manual_scaling:
  instances: 1

resources:
  cpu: 1
  memory_gb: 0.5
  disk_size_gb: 10

And my requirements.txt

google-cloud-datastore==1.3.0
google-cloud-dataflow==2.5.0
google-apitools==0.5.16
googledatastore==7.0.1
apache-beam==2.5.0
apache-beam[gcp]==2.5.0
Flask==0.12.2
gunicorn==19.9.0

I actually had this issue like a couple of days ago but I was trying to run it on GAE Standard python 3.7. Having said that I resolved my issue by including gunicorn in my requirements.txt file. Originally I didn't because I had missunderstood this line from the docs:

Do not include gunicorn in your requirements.txt file unless you are specifying the entrypoint.

https://cloud.google.com/appengine/docs/standard/python3/runtime

Again this was for GAE Standard.

Is your code in a main.py file?

In your yaml file,

entrypoint: gunicorn -b :$PORT main:app

tells gunicorn to look for app variable in main module (more info here ). If you don't have main.py , it will throw an error.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM