我的Flask應用程序未在Heroku上運行，但在本地運行

Question

當我使用自定義標記化功能加載使用flask構建的機器學習模型時，我有一個機器學習應用程序。 當我運行python app.py時，它運行完美。 但是它不能在heroku上運行。 並顯示錯誤AttributeError: module '__main__' has no attribute 'tokenize'

我已經讀過另一篇文章，為什么我的Flask應用在使用python app.py執行時可以運行，而在使用heroku本地網絡或flask運行時卻不能運行？ 似乎有同樣的問題，但我不明白它提出的解決方案。 錯誤來自model = joblib.load("models/adaboost_model.pkl")行model = joblib.load("models/adaboost_model.pkl") 。 我已經有一個單獨的文件建立在定制的標記生成器功能tokenizer_function.py ，但它仍然無法正常工作。

我的app.py文件如下所示

import json
import plotly
import pandas as pd
import re
from collections import Counter

# import NLP libraries
from tokenizer_function import tokenize

from flask import Flask
from flask import render_template, request, jsonify
from plotly.graph_objs import Bar
from sklearn.externals import joblib
from sqlalchemy import create_engine


app = Flask(__name__)


@app.before_first_request

def load_model_data():
    global df
    global model
    # load data

    engine = create_engine('sqlite:///data/DisasterResponse.db')
    df = pd.read_sql_table('DisasterResponse', engine)

    # load model
    model = joblib.load("models/adaboost_model.pkl")

# index webpage displays cool visuals and receives user input text for model
@app.route('/')
@app.route('/index')

def index():

    # extract data needed for visuals
    # Message counts of different generes
    genre_counts = df.groupby('genre').count()['message']
    genre_names = list(genre_counts.index)

    # Message counts for different categories
    cate_counts_df = df.iloc[:, 4:].sum().sort_values(ascending=False)
    cate_counts = list(cate_counts_df)
    cate_names = list(cate_counts_df.index)

    # Top keywords in Social Media in percentages
    social_media_messages = ' '.join(df[df['genre'] == 'social']['message'])
    social_media_tokens = tokenize(social_media_messages)
    social_media_wrd_counter = Counter(social_media_tokens).most_common()
    social_media_wrd_cnt = [i[1] for i in social_media_wrd_counter]
    social_media_wrd_pct = [i/sum(social_media_wrd_cnt) *100 for i in social_media_wrd_cnt]
    social_media_wrds = [i[0] for i in social_media_wrd_counter]

    # Top keywords in Direct in percentages
    direct_messages = ' '.join(df[df['genre'] == 'direct']['message'])
    direct_tokens = tokenize(direct_messages)
    direct_wrd_counter = Counter(direct_tokens).most_common()
    direct_wrd_cnt = [i[1] for i in direct_wrd_counter]
    direct_wrd_pct = [i/sum(direct_wrd_cnt) * 100 for i in direct_wrd_cnt]
    direct_wrds = [i[0] for i in direct_wrd_counter]

    # create visuals

    graphs = [
    # Histogram of the message genere
        {
            'data': [
                Bar(
                    x=genre_names,
                    y=genre_counts
                )
            ],

            'layout': {
                'title': 'Distribution of Message Genres',
                'yaxis': {
                    'title': "Count"
                },
                'xaxis': {
                    'title': "Genre"
                }
            }
        },
        # histogram of social media messages top 30 keywords 
        {
            'data': [
                    Bar(
                        x=social_media_wrds[:50],
                        y=social_media_wrd_pct[:50]
                                    )
            ],

            'layout':{
                'title': "Top 50 Keywords in Social Media Messages",
                'xaxis': {'tickangle':60
                },
                'yaxis': {
                    'title': "% Total Social Media Messages"    
                }
            }
        }, 

        # histogram of direct messages top 30 keywords 
        {
            'data': [
                    Bar(
                        x=direct_wrds[:50],
                        y=direct_wrd_pct[:50]
                                    )
            ],

            'layout':{
                'title': "Top 50 Keywords in Direct Messages",
                'xaxis': {'tickangle':60
                },
                'yaxis': {
                    'title': "% Total Direct Messages"    
                }
            }
        }, 



        # histogram of messages categories distributions
        {
            'data': [
                    Bar(
                        x=cate_names,
                        y=cate_counts
                                    )
            ],

            'layout':{
                'title': "Distribution of Message Categories",
                'xaxis': {'tickangle':60
                },
                'yaxis': {
                    'title': "count"    
                }
            }
        },     

    ]

    # encode plotly graphs in JSON
    ids = ["graph-{}".format(i) for i, _ in enumerate(graphs)]
    graphJSON = json.dumps(graphs, cls=plotly.utils.PlotlyJSONEncoder)

    # render web page with plotly graphs
    return render_template('master.html', ids=ids, graphJSON=graphJSON)


# web page that handles user query and displays model results
@app.route('/go')
def go():
    # save user input in query
    query = request.args.get('query', '') 

    # use model to predict classification for query
    classification_labels = model.predict([query])[0]
    classification_results = dict(zip(df.columns[4:], classification_labels))

    # This will render the go.html Please see that file. 
    return render_template(
        'go.html',
        query=query,
        classification_result=classification_results
    )


def main():
    app.run()


if __name__ == '__main__':
    main()

我的tokenizer_function.py如下所示

import re
import nltk 
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem.wordnet import WordNetLemmatizer


def tokenize(text):
    """
        Tokenize the message into word level features. 
        1. replace urls
        2. convert to lower cases
        3. remove stopwords
        4. strip white spaces
    Args: 
        text: input text messages
    Returns: 
        cleaned tokens(List)
    """   
    # Define url pattern
    url_re = 'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\), ]|(?:%[0-9a-fA-F][0-9a-fA-F]))+'

    # Detect and replace urls
    detected_urls = re.findall(url_re, text)
    for url in detected_urls:
        text = text.replace(url, "urlplaceholder")

    # tokenize sentences
    tokens = word_tokenize(text)
    lemmatizer = WordNetLemmatizer()

    # save cleaned tokens
    clean_tokens = [lemmatizer.lemmatize(tok).lower().strip() for tok in tokens]

    # remove stopwords
    STOPWORDS = list(set(stopwords.words('english')))
    clean_tokens = [token for token in clean_tokens if token not in STOPWORDS]

    return clean_tokens

為什么仍會返回錯誤？ 我已經將tokenizer函數分離在一個單獨的文件中。 任何幫助或解釋都非常感謝！

編輯：

完整的回溯：

2018-12-26T19:54:46.098725+00:00 app[web.1]: model = joblib.load("models/rf_model.pkl")
2018-12-26T19:54:46.098727+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 598, in load
2018-12-26T19:54:46.098728+00:00 app[web.1]: obj = _unpickle(fobj, filename, mmap_mode)
2018-12-26T19:54:46.098730+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 526, in _unpickle
2018-12-26T19:54:46.098732+00:00 app[web.1]: obj = unpickler.load()
2018-12-26T19:54:46.098733+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/pickle.py", line 1050, in load
2018-12-26T19:54:46.098734+00:00 app[web.1]: dispatch[key[0]](self)
2018-12-26T19:54:46.098736+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/pickle.py", line 1338, in load_global
2018-12-26T19:54:46.098738+00:00 app[web.1]: klass = self.find_class(module, name)
2018-12-26T19:54:46.098739+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/pickle.py", line 1392, in find_class
2018-12-26T19:54:46.098741+00:00 app[web.1]: return getattr(sys.modules[module], name)
2018-12-26T19:54:46.098752+00:00 app[web.1]: AttributeError: module '__main__' has no attribute 'tokenize'
2018-12-26T19:54:46.103665+00:00 app[web.1]: [2018-12-26 19:54:46 +0000] [11] [INFO] Worker exiting (pid: 11)
2018-12-26T19:54:46.253217+00:00 app[web.1]: [2018-12-26 19:54:46 +0000] [10] [ERROR] Exception in worker process
2018-12-26T19:54:46.253221+00:00 app[web.1]: Traceback (most recent call last):
2018-12-26T19:54:46.253222+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
2018-12-26T19:54:46.253228+00:00 app[web.1]: worker.init_process()
2018-12-26T19:54:46.253230+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/site-packages/gunicorn/workers/base.py", line 129, in init_process
2018-12-26T19:54:46.253231+00:00 app[web.1]: self.load_wsgi()
2018-12-26T19:54:46.253233+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/site-packages/gunicorn/workers/base.py", line 138, in load_wsgi
2018-12-26T19:54:46.253234+00:00 app[web.1]: self.wsgi = self.app.wsgi()
2018-12-26T19:54:46.253236+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/site-packages/gunicorn/app/base.py", line 67, in wsgi
2018-12-26T19:54:46.253237+00:00 app[web.1]: self.callable = self.load()
2018-12-26T19:54:46.253239+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/site-packages/gunicorn/app/wsgiapp.py", line 52, in load
2018-12-26T19:54:46.253240+00:00 app[web.1]: return self.load_wsgiapp()
2018-12-26T19:54:46.253242+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/site-packages/gunicorn/app/wsgiapp.py", line 41, in load_wsgiapp2018-12-26T19:54:46.253243+00:00 app[web.1]: return util.import_app(self.app_uri)
2018-12-26T19:54:46.253245+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/site-packages/gunicorn/util.py", line 350, in import_app
2018-12-26T19:54:46.253246+00:00 app[web.1]: __import__(module)
2018-12-26T19:54:46.253248+00:00 app[web.1]: File "/app/app.py", line 59, in <module>
2018-12-26T19:54:46.253249+00:00 app[web.1]: model = joblib.load("models/rf_model.pkl")
2018-12-26T19:54:46.253251+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 598, in load
2018-12-26T19:54:46.253253+00:00 app[web.1]: obj = _unpickle(fobj, filename, mmap_mode)
2018-12-26T19:54:46.253254+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 526, in _unpickle
2018-12-26T19:54:46.253256+00:00 app[web.1]: obj = unpickler.load()
2018-12-26T19:54:46.253257+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/pickle.py", line 1050, in load
2018-12-26T19:54:46.253259+00:00 app[web.1]: dispatch[key[0]](self)
2018-12-26T19:54:46.253261+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/pickle.py", line 1338, in load_global
2018-12-26T19:54:46.253262+00:00 app[web.1]: klass = self.find_class(module, name)
2018-12-26T19:54:46.253264+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/pickle.py", line 1392, in find_class
2018-12-26T19:54:46.253265+00:00 app[web.1]: return getattr(sys.modules[module], name)
2018-12-26T19:54:46.253273+00:00 app[web.1]: AttributeError: module '__main__' has no attribute 'tokenize'
2018-12-26T19:54:46.254747+00:00 app[web.1]: [2018-12-26 19:54:46 +0000] [10] [INFO] Worker exiting (pid: 10)
2018-12-26T19:54:46.769738+00:00 heroku[web.1]: State changed from up to crashed
2018-12-26T19:54:46.750723+00:00 heroku[web.1]: Process exited with status 3
2018-12-26T19:55:27.000000+00:00 app[api]: Build started by user chenbowen184@gmail.com
2018-12-26T19:57:21.400346+00:00 heroku[web.1]: State changed from crashed to starting
2018-12-26T19:57:20.489748+00:00 app[api]: Deploy 87045c08 by user chenbowen184@gmail.com
2018-12-26T19:57:20.489748+00:00 app[api]: Release v4 created by user chenbowen184@gmail.com
2018-12-26T19:57:59.712871+00:00 heroku[web.1]: State changed from starting to up
2018-12-26T19:57:59.469112+00:00 app[web.1]: [2018-12-26 19:57:59 +0000] [4] [INFO] Starting gunicorn 19.9.0
2018-12-26T19:57:59.470292+00:00 app[web.1]: [2018-12-26 19:57:59 +0000] [4] [INFO] Listening at: http://0.0.0.0:53013 (4)
2018-12-26T19:57:59.470476+00:00 app[web.1]: [2018-12-26 19:57:59 +0000] [4] [INFO] Using worker: sync
2018-12-26T19:57:59.480025+00:00 app[web.1]: [2018-12-26 19:57:59 +0000] [10] [INFO] Booting worker with pid: 10
2018-12-26T19:57:59.579014+00:00 app[web.1]: [2018-12-26 19:57:59 +0000] [11] [INFO] Booting worker with pid: 11
2018-12-26T19:57:59.000000+00:00 app[api]: Build succeeded
2018-12-26T20:05:53.000000+00:00 app[api]: Build started by user chenbowen184@gmail.com
2018-12-26T20:08:05.985733+00:00 heroku[web.1]: State changed from up to starting
2018-12-26T20:08:05.760144+00:00 app[api]: Deploy 30844f53 by user chenbowen184@gmail.com
2018-12-26T20:08:05.760144+00:00 app[api]: Release v5 created by user chenbowen184@gmail.com
2018-12-26T20:08:07.063086+00:00 heroku[web.1]: Stopping all processes with SIGTERM
2018-12-26T20:08:08.397480+00:00 heroku[web.1]: Stopping all processes with SIGTERM
2018-12-26T20:08:08.219918+00:00 app[web.1]: [2018-12-26 20:08:08 +0000] [4] [INFO] Shutting down: Master
2018-12-26T20:08:31.717772+00:00 heroku[web.1]: Starting process with command `gunicorn app:app`
2018-12-26T20:08:33.821758+00:00 app[web.1]: [2018-12-26 20:08:33 +0000] [4] [INFO] Starting gunicorn 19.9.0
2018-12-26T20:08:33.822253+00:00 app[web.1]: [2018-12-26 20:08:33 +0000] [4] [INFO] Listening at: http://0.0.0.0:46765 (4)
2018-12-26T20:08:33.822350+00:00 app[web.1]: [2018-12-26 20:08:33 +0000] [4] [INFO] Using worker: sync
2018-12-26T20:08:33.826441+00:00 app[web.1]: [2018-12-26 20:08:33 +0000] [10] [INFO] Booting worker with pid: 10
2018-12-26T20:08:33.894893+00:00 app[web.1]: [2018-12-26 20:08:33 +0000] [11] [INFO] Booting worker with pid: 11
2018-12-26T20:08:35.332075+00:00 heroku[web.1]: State changed from starting to up
2018-12-26T20:08:38.891790+00:00 heroku[router]: at=info method=GET path="/" host=disaster-response-app184.herokuapp.com request_id=37e89d80-429a-4aa7-a356-781238ebecce fwd="76.90.60.254" dyno=web.1 connect=0ms service=2923ms status=500 bytes=456 protocol=https
2018-12-26T20:08:40.489401+00:00 app[web.1]: [2018-12-26 20:08:40,488] ERROR in app: Exception on /favicon.ico [GET]
2018-12-26T20:08:40.489413+00:00 app[web.1]: Traceback (most recent call last):
2018-12-26T20:08:40.489415+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/site-packages/flask/app.py", line 2292, in wsgi_app
2018-12-26T20:08:40.489417+00:00 app[web.1]: response = self.full_dispatch_request()
2018-12-26T20:08:40.489418+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/site-packages/flask/app.py", line 1808, in full_dispatch_request2018-12-26T20:08:40.489420+00:00 app[web.1]: self.try_trigger_before_first_request_functions()
2018-12-26T20:08:40.489422+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/site-packages/flask/app.py", line 1855, in try_trigger_before_first_request_functions
2018-12-26T20:08:40.489424+00:00 app[web.1]: func()
2018-12-26T20:08:40.489426+00:00 app[web.1]: File "/app/app.py", line 31, in load_model_data
2018-12-26T20:08:40.489427+00:00 app[web.1]: model = joblib.load("models/adaboost_model.pkl")
2018-12-26T20:08:40.489429+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 598, in load
2018-12-26T20:08:40.489430+00:00 app[web.1]: obj = _unpickle(fobj, filename, mmap_mode)
2018-12-26T20:08:40.489432+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 526, in _unpickle
2018-12-26T20:08:40.489434+00:00 app[web.1]: obj = unpickler.load()
2018-12-26T20:08:40.489435+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/pickle.py", line 1050, in load
2018-12-26T20:08:40.489437+00:00 app[web.1]: dispatch[key[0]](self)
2018-12-26T20:08:40.489438+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/pickle.py", line 1338, in load_global
2018-12-26T20:08:40.489440+00:00 app[web.1]: klass = self.find_class(module, name)
2018-12-26T20:08:40.489441+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.6/pickle.py", line 1392, in find_class
2018-12-26T20:08:40.489443+00:00 app[web.1]: return getattr(sys.modules[module], name)
2018-12-26T20:08:40.489451+00:00 app[web.1]: AttributeError: module '__main__' has no attribute 'tokenize'
2018-12-26T20:08:40.490172+00:00 app[web.1]: 10.97.234.47 - - [26/Dec/2018:20:08:40 +0000] "GET /favicon.ico HTTP/1.1" 500 291 "https://disaster-response-app184.herokuapp.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36"
2018-12-26T20:42:17.957629+00:00 heroku[web.1]: Idling
2018-12-26T20:42:17.961855+00:00 heroku[web.1]: State changed from up to down
2018-12-26T20:42:19.098318+00:00 heroku[web.1]: Stopping all processes with SIGTERM
2018-12-26T20:42:19.160640+00:00 app[web.1]: [2018-12-26 20:42:19 +0000] [4] [INFO] Handling signal: term
2018-12-26T20:42:19.189166+00:00 app[web.1]: [2018-12-26 20:42:19 +0000] [10] [INFO] Worker exiting (pid: 10)
2018-12-26T20:42:19.205573+00:00 app[web.1]: [2018-12-26 20:42:19 +0000] [11] [INFO] Worker exiting (pid: 11)
2018-12-26T20:42:21.004227+00:00 heroku[web.1]: Process exited with status 0

Procfile：

web: gunicorn app:app

Answer 1

您提到的帖子為何在使用python app.py執行時為什么我的Flask應用程序可以工作，而在使用heroku local web或flask run python app.py不能flask run ？ 真正解決了我的問題

您是否也使用與在訓練腳本中加載相同的導入格式？ 即from tokenizer_function import tokenize訓練腳本中的from tokenizer_function import tokenize嗎？

Answer 2

我設法通過重新定義train_classifier.py中定義的管道來解決此問題。 我定義了一個名為Tokenize的自定義轉換器。 現在tokenizer_function.py如下所示。

import re
import nltk 
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem.wordnet import WordNetLemmatizer
import pandas as pd
from sklearn.base import BaseEstimator, TransformerMixin

class Tokenizer(BaseEstimator, TransformerMixin):
    def __init__(self):
        pass

    def fit(self, X, y=None):
        return self

    def transform(self, X):
        def tokenize(text):
            """
                Tokenize the message into word level features. 
                1. replace urls
                2. convert to lower cases
                3. remove stopwords
                4. strip white spaces
            Args: 
                text: input text messages
            Returns: 
                cleaned tokens(List)
            """   
            # Define url pattern
            url_re = 'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\), ]|(?:%[0-9a-fA-F][0-9a-fA-F]))+'

            # Detect and replace urls
            detected_urls = re.findall(url_re, text)
            for url in detected_urls:
                text = text.replace(url, "urlplaceholder")

            # tokenize sentences
            tokens = word_tokenize(text)
            lemmatizer = WordNetLemmatizer()

            # save cleaned tokens
            clean_tokens = [lemmatizer.lemmatize(tok).lower().strip() for tok in tokens]

            # remove stopwords
            STOPWORDS = list(set(stopwords.words('english')))
            clean_tokens = [token for token in clean_tokens if token not in STOPWORDS]

            return ' '.join(clean_tokens)

        return pd.Series(X).apply(tokenize).values

使用上面的轉換器，我能夠將令牌化器包裝到管道中，從而解決了我遇到的問題。

我的Flask應用程序未在Heroku上運行，但在本地運行

問題描述

2 個解決方案

解決方案1
0 2018-12-27 05:23:14

解決方案2
0 2018-12-27 07:10:47

我的Flask應用程序未在Heroku上運行，但在本地運行

問題描述

2 個解決方案

解決方案1 0 2018-12-27 05:23:14

解決方案2 0 2018-12-27 07:10:47

解決方案1
0 2018-12-27 05:23:14

解決方案2
0 2018-12-27 07:10:47