简体   繁体   English

模块 google.cloud 没有属性 storage

[英]module google.cloud has no attribute storage

I'm trying to run a beam script in python on GCP following this tutorial:我正在尝试按照本教程在 GCP 上的 python 中运行梁脚本:

[https://levelup.gitconnected.com/scaling-scikit-learn-with-apache-beam-251eb6fcf75b][1] [https://levelup.gitconnected.com/scaling-scikit-learn-with-apache-beam-251eb6fcf75b][1]

but I keep getting the following error:但我不断收到以下错误:

AttributeError: module 'google.cloud' has no attribute 'storage'

I have google-cloud-storage in my requirements.txt so really not sure what I'm missing here.我的 requirements.txt 中有 google-cloud-storage,所以真的不确定我在这里缺少什么。

My full script:我的完整脚本:

import apache_beam as beam
import json

query = """
    SELECT 
    year, 
    plurality, 
    apgar_5min, 
    mother_age, 
    father_age,
    gestation_weeks,
    ever_born,
    case when mother_married = true then 1 else 0 end as mother_married,
    weight_pounds as weight,
    current_timestamp as time,
    GENERATE_UUID() as guid
    FROM `bigquery-public-data.samples.natality` 
    order by rand()
    limit 100    
""" 

class ApplyDoFn(beam.DoFn):
    def __init__(self):
        self._model = None
        from google.cloud import storage
        import pandas as pd
        import pickle as pkl
        self._storage = storage
        self._pkl = pkl
        self._pd = pd
    
    def process(self, element):
        if self._model is None:
            bucket = self._storage.Client().get_bucket('bqr_dump')
            blob = bucket.get_blob('natality/sklearn-linear')
            self._model = self._pkl.loads(blob.download_as_string())
            
        new_x = self._pd.DataFrame.from_dict(element,
                                            orient='index').transpose().fillna(0)
        pred_weight = self._model.predict(new_x.iloc[:, 1:8])[0]
        return [ {'guid': element['guid'],
                 'predicted_weight': pred_weight,
                 'time': str(element['time'])}]



# set up pipeline options
options = {'project': my-project-name,
           'runner': 'DataflowRunner',
           'temp_location': 'gs://bqr_dump/tmp',
           'staging_location': 'gs://bqr_dump/tmp'
           }

pipeline_options = beam.pipeline.PipelineOptions(flags=[], **options)

with beam.Pipeline(options=pipeline_options) as pipeline:
    (
        pipeline
        | 'ReadTable' >> beam.io.Read(beam.io.BigQuerySource(
            query=query,
            use_standard_sql=True))
        | 'Apply Model' >> beam.ParDo(ApplyDoFn())
        | 'Save to BigQuery' >> beam.io.WriteToBigQuery(
            'pzn-pi-sto:beam_test.weight_preds', 
            schema='guid:STRING,weight:FLOAT64,time:STRING', 
            write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
            create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED))

and my requirements.txt:和我的 requirements.txt:

google-cloud==0.34.0
google-cloud-storage==1.30.0
apache-beam[GCP]==2.20.0

This issue is usually related to two main reasons: the modules not being well installed, which means that something broke during the installation and the second reason, the import of the module not being correctly done.这个问题通常与两个主要原因有关:模块没有安装好,这意味着在安装过程中出现了问题;第二个原因是模块的import没有正确完成。

To fix the issue, in case the reason is the broken modules, reinstalling or checking it in a virtual environment would be the solution.要解决此问题,如果原因是模块损坏,则在虚拟环境中重新安装或检查将是解决方案。 As indicated here , a similar case as yours, this should fix your case.如此处所示,与您的情况类似,这应该可以解决您的情况。

For the second reason, try to change your code and import all the modules in the beginning of the code, as demonstrated in this official example here .对于第二个原因,尝试更改代码并导入代码开头的所有模块,如这里官方示例所示。 Your code should be something like this:你的代码应该是这样的:

import apache_beam as beam
import json
import pandas as pd
import pickle as pkl

from google.cloud import storage
...

Let me know if this information helped you!让我知道这些信息是否对您有帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM