模块 google.cloud 没有属性 storage

Question

我正在尝试按照本教程在 GCP 上的 python 中运行梁脚本：

[https://levelup.gitconnected.com/scaling-scikit-learn-with-apache-beam-251eb6fcf75b][1]

但我不断收到以下错误：

AttributeError: module 'google.cloud' has no attribute 'storage'

我的 requirements.txt 中有 google-cloud-storage，所以真的不确定我在这里缺少什么。

我的完整脚本：

import apache_beam as beam
import json

query = """
    SELECT 
    year, 
    plurality, 
    apgar_5min, 
    mother_age, 
    father_age,
    gestation_weeks,
    ever_born,
    case when mother_married = true then 1 else 0 end as mother_married,
    weight_pounds as weight,
    current_timestamp as time,
    GENERATE_UUID() as guid
    FROM `bigquery-public-data.samples.natality` 
    order by rand()
    limit 100    
""" 

class ApplyDoFn(beam.DoFn):
    def __init__(self):
        self._model = None
        from google.cloud import storage
        import pandas as pd
        import pickle as pkl
        self._storage = storage
        self._pkl = pkl
        self._pd = pd
    
    def process(self, element):
        if self._model is None:
            bucket = self._storage.Client().get_bucket('bqr_dump')
            blob = bucket.get_blob('natality/sklearn-linear')
            self._model = self._pkl.loads(blob.download_as_string())
            
        new_x = self._pd.DataFrame.from_dict(element,
                                            orient='index').transpose().fillna(0)
        pred_weight = self._model.predict(new_x.iloc[:, 1:8])[0]
        return [ {'guid': element['guid'],
                 'predicted_weight': pred_weight,
                 'time': str(element['time'])}]



# set up pipeline options
options = {'project': my-project-name,
           'runner': 'DataflowRunner',
           'temp_location': 'gs://bqr_dump/tmp',
           'staging_location': 'gs://bqr_dump/tmp'
           }

pipeline_options = beam.pipeline.PipelineOptions(flags=[], **options)

with beam.Pipeline(options=pipeline_options) as pipeline:
    (
        pipeline
        | 'ReadTable' >> beam.io.Read(beam.io.BigQuerySource(
            query=query,
            use_standard_sql=True))
        | 'Apply Model' >> beam.ParDo(ApplyDoFn())
        | 'Save to BigQuery' >> beam.io.WriteToBigQuery(
            'pzn-pi-sto:beam_test.weight_preds', 
            schema='guid:STRING,weight:FLOAT64,time:STRING', 
            write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
            create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED))

和我的 requirements.txt：

google-cloud==0.34.0
google-cloud-storage==1.30.0
apache-beam[GCP]==2.20.0

Answer 1

这个问题通常与两个主要原因有关：模块没有安装好，这意味着在安装过程中出现了问题；第二个原因是模块的import没有正确完成。

要解决此问题，如果原因是模块损坏，则在虚拟环境中重新安装或检查将是解决方案。 如此处所示，与您的情况类似，这应该可以解决您的情况。

对于第二个原因，尝试更改代码并导入代码开头的所有模块，如这里官方示例所示。 你的代码应该是这样的：

import apache_beam as beam
import json
import pandas as pd
import pickle as pkl

from google.cloud import storage
...

让我知道这些信息是否对您有帮助！

模块 google.cloud 没有属性 storage

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-08-11 07:00:55

模块 google.cloud 没有属性 storage

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-08-11 07:00:55

解决方案1
1 已采纳 2020-08-11 07:00:55