[英]ImportError phonenumbers with google cloud dataflow python
我试图在Python中相对简单地导入模块phonenumbers。
我已经在一个单独的python文件上测试了模块,没有任何其他导入,它完全正常。
这些是我安装的软件包:
from __future__ import absolute_import
from __future__ import print_function
import argparse
import csv
import logging
import os
import phonenumbers
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.options.pipeline_options import SetupOptions
这是我的错误信息:
Traceback (most recent call last):
File "clean.py", line 114, in <module>
run()
File "clean.py", line 109, in run
| 'WriteOutputText' >> beam.io.WriteToText(known_args.output))
File "C:\Python27\lib\site-packages\apache_beam\pipeline.py", line 389, in __exit__
self.run().wait_until_finish()
File "C:\Python27\lib\site-packages\apache_beam\runners\dataflow\dataflow_runner.py", line 996, in wait_until_finish
(self.state, getattr(self._runner, 'last_error_msg', None)), self)
apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 733, in run
self._load_main_session(self.local_staging_directory)
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 472, in _load_main_session
pickler.load_session(session_file)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py", line 247, in load_session
return dill.load_session(file_path)
File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 363, in load_session
module = unpickler.load()
File "/usr/lib/python2.7/pickle.py", line 864, in load
dispatch[key](self)
File "/usr/lib/python2.7/pickle.py", line 1139, in load_reduce
value = func(*args)
File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 766, in _import_module
return __import__(import_name)
ImportError: No module named phonenumbers
任何帮助将不胜感激,谢谢!
编辑:我已经用pip安装了phonenumbers
$ pip install phonenumbers
Requirement already satisfied: phonenumbers in c:\python27\lib\site-packages (8.9.7)
gax-google-logging-v2 0.8.3 has requirement google-gax<0.13.0,>=0.12.5, but you'll have google-gax 0.15.16 which is incompatible.
gcloud 0.18.3 has requirement google-gax<0.13dev,>=0.12.3, but you'll have google-gax 0.15.16 which is incompatible.
google-cloud-vision 0.29.0 has requirement requests<3.0dev,>=2.18.4, but you'll have requests 2.18.2 which is incompatible.
gax-google-pubsub-v1 0.8.3 has requirement google-gax<0.13.0,>=0.12.5, but you'll have google-gax 0.15.16 which is incompatible.
google-cloud-spanner 0.29.0 has requirement requests<3.0dev,>=2.18.4, but you'll have requests 2.18.2 which is incompatible.
这是点子冻结输出
$ pip freeze
adal==1.0.1
apache-beam==2.4.0
asn1crypto==0.22.0
avro==1.8.2
azure==3.0.0
azure-batch==4.1.3
azure-common==1.1.12
azure-cosmosdb-nspkg==2.0.2
azure-cosmosdb-table==1.0.3
azure-datalake-store==0.0.22
azure-eventgrid==0.1.0
azure-graphrbac==0.40.0
azure-keyvault==0.3.7
azure-mgmt==2.0.0
azure-mgmt-advisor==1.0.1
azure-mgmt-applicationinsights==0.1.1
azure-mgmt-authorization==0.30.0
azure-mgmt-batch==5.0.1
azure-mgmt-batchai==0.2.0
azure-mgmt-billing==0.1.0
azure-mgmt-cdn==2.0.0
azure-mgmt-cognitiveservices==2.0.0
azure-mgmt-commerce==1.0.1
azure-mgmt-compute==3.0.1
azure-mgmt-consumption==2.0.0
azure-mgmt-containerinstance==0.3.1
azure-mgmt-containerregistry==1.0.1
azure-mgmt-containerservice==3.0.1
azure-mgmt-cosmosdb==0.3.1
azure-mgmt-datafactory==0.4.0
azure-mgmt-datalake-analytics==0.3.0
azure-mgmt-datalake-nspkg==2.0.0
azure-mgmt-datalake-store==0.3.0
azure-mgmt-devtestlabs==2.2.0
azure-mgmt-dns==1.2.0
azure-mgmt-eventgrid==0.4.0
azure-mgmt-eventhub==1.2.0
azure-mgmt-hanaonazure==0.1.1
azure-mgmt-iothub==0.4.0
azure-mgmt-iothubprovisioningservices==0.1.0
azure-mgmt-keyvault==0.40.0
azure-mgmt-loganalytics==0.1.0
azure-mgmt-logic==2.1.0
azure-mgmt-machinelearningcompute==0.4.1
azure-mgmt-managementpartner==0.1.0
azure-mgmt-marketplaceordering==0.1.0
azure-mgmt-media==0.2.0
azure-mgmt-monitor==0.4.0
azure-mgmt-msi==0.1.0
azure-mgmt-network==1.7.1
azure-mgmt-notificationhubs==1.0.0
azure-mgmt-nspkg==2.0.0
azure-mgmt-powerbiembedded==1.0.0
azure-mgmt-rdbms==0.1.0
azure-mgmt-recoveryservices==0.2.0
azure-mgmt-recoveryservicesbackup==0.1.1
azure-mgmt-redis==5.0.0
azure-mgmt-relay==0.1.0
azure-mgmt-reservations==0.1.0
azure-mgmt-resource==1.2.2
azure-mgmt-scheduler==1.1.3
azure-mgmt-search==1.0.0
azure-mgmt-servermanager==1.2.0
azure-mgmt-servicebus==0.4.0
azure-mgmt-servicefabric==0.1.0
azure-mgmt-sql==0.8.6
azure-mgmt-storage==1.5.0
azure-mgmt-subscription==0.1.0
azure-mgmt-trafficmanager==0.40.0
azure-mgmt-web==0.34.1
azure-nspkg==2.0.0
azure-servicebus==0.21.1
azure-servicefabric==6.1.2.9
azure-servicemanagement-legacy==0.20.6
azure-storage-blob==1.1.0
azure-storage-common==1.1.0
azure-storage-file==1.1.0
azure-storage-nspkg==3.0.0
azure-storage-queue==1.1.0
CacheControl==0.12.5
cachetools==2.1.0
certifi==2017.7.27.1
cffi==1.10.0
chardet==3.0.4
click==6.7
configparser==3.5.0
crcmod==1.7
cryptography==2.0.3
deprecation==2.0.3
dill==0.2.6
docopt==0.6.2
entrypoints==0.2.3
enum34==1.1.6
fasteners==0.14.1
firebase-admin==2.11.0
Flask==0.12.2
funcsigs==1.0.2
future==0.16.0
futures==3.2.0
gapic-google-cloud-datastore-v1==0.15.3
gapic-google-cloud-error-reporting-v1beta1==0.15.3
gapic-google-cloud-logging-v2==0.91.3
gapic-google-cloud-pubsub-v1==0.15.4
gax-google-logging-v2==0.8.3
gax-google-pubsub-v1==0.8.3
gcloud==0.18.3
google-api-core==0.1.4
google-apitools==0.5.20
google-auth==1.5.0
google-auth-httplib2==0.0.3
google-auth-oauthlib==0.2.0
google-cloud==0.33.1
google-cloud-bigquery==0.28.0
google-cloud-bigquery-datatransfer==0.1.1
google-cloud-bigtable==0.28.1
google-cloud-container==0.1.1
google-cloud-core==0.28.1
google-cloud-dataflow==2.4.0
google-cloud-datastore==1.4.0
google-cloud-dns==0.28.0
google-cloud-error-reporting==0.28.0
google-cloud-firestore==0.28.0
google-cloud-language==1.0.2
google-cloud-logging==1.4.0
google-cloud-monitoring==0.28.1
google-cloud-pubsub==0.30.1
google-cloud-resource-manager==0.28.1
google-cloud-runtimeconfig==0.28.1
google-cloud-spanner==0.29.0
google-cloud-speech==0.30.0
google-cloud-storage==1.6.0
google-cloud-trace==0.17.0
google-cloud-translate==1.3.1
google-cloud-videointelligence==1.0.1
google-cloud-vision==0.29.0
google-gax==0.15.16
google-resumable-media==0.3.1
googleapis-common-protos==1.5.3
googledatastore==7.0.1
grpc-google-iam-v1==0.11.4
grpc-google-logging-v2==0.8.1
grpc-google-pubsub-v1==0.8.1
grpcio==1.12.0
gunicorn==19.7.1
hdfs==2.1.0
httplib2==0.9.2
idna==2.5
ipaddress==1.0.18
iso8601==0.1.12
isodate==0.6.0
itsdangerous==0.24
Jinja2==2.9.6
jmespath==0.9.3
keyring==12.2.1
keystoneauth1==3.8.0
linecache2==1.0.0
MarkupSafe==1.0
mock==2.0.0
monotonic==1.5
msgpack==0.5.6
msrest==0.5.0
msrestazure==0.4.32
ndg-httpsclient==0.4.2
nelson==0.4.0
oauth2client==3.0.0
oauthlib==2.1.0
os-service-types==1.2.0
packaging==17.1
pathlib2==2.3.2
pbr==4.0.3
phonenumbers==8.9.7
ply==3.8
proto-google-cloud-datastore-v1==0.90.4
proto-google-cloud-error-reporting-v1beta1==0.15.3
proto-google-cloud-logging-v2==0.91.3
proto-google-cloud-pubsub-v1==0.15.4
protobuf==3.5.2.post1
psutil==5.4.6
psycopg2==2.7.3.2
pyasn1==0.4.3
pyasn1-modules==0.2.1
pycparser==2.18
pyjwt==1.5.0
pyOpenSSL==17.2.0
pyparsing==2.2.0
pyreadline==2.1
python-dateutil==2.7.3
pytz==2018.3
PyVCF==0.6.8
pywin32-ctypes==0.1.2
PyYAML==3.12
rackspaceauth==0.2.0
requests==2.18.2
requests-oauthlib==0.8.0
requests-toolbelt==0.8.0
rsa==3.4.2
scandir==1.7
six==1.10.0
SQLAlchemy==1.1.14
stevedore==1.28.0
traceback2==1.4.0
twilio==6.5.0
typing==3.6.4
unittest2==1.1.0
urllib3==1.22
virtualenv==16.0.0
Werkzeug==0.12.2
编辑:代码==
from __future__ import absolute_import
from __future__ import print_function
import argparse
import csv
import logging
import os
from collections import OrderedDict
import phonenumbers
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.options.pipeline_options import SetupOptions
class ParseCSVFn(beam.DoFn):
"""Parses the raw CSV data into a Python dictionary."""
def process(self, elem):
try:
row = list(csv.reader([elem]))[0]
month, day, year = row[2].split('/')
birth_dict = {
'day': day,
'month': month,
'year': year,
}
order_dict = OrderedDict(birth_dict)
data_dict = {
'phoneNumber': row[4],
'firstName': row[0],
'lastName': row[1],
'birthDate': order_dict,
'voterId': row[3],
}
order_data_dict = OrderedDict(data_dict)
yield order_data_dict
except:
pass
def run(argv=None):
"""Pipeline entry point, runs the all the necessary processes"""
parser = argparse.ArgumentParser()
parser.add_argument('--input',
type=str,
dest='input',
default='gs://wordcount_project/demo-contacts-small*.csv',
help='Input file to process.')
parser.add_argument('--output',
dest='output',
# CHANGE 1/5: The Google Cloud Storage path is required
# for outputting the results.
default='gs://wordcount_project/cleaned.csv',
help='Output file to write results to.')
known_args, pipeline_args = parser.parse_known_args(argv)
pipeline_args.extend([
# CHANGE 2/5: (OPTIONAL) Change this to DataflowRunner to
# run your pipeline on the Google Cloud Dataflow Service.
'--runner=DataflowRunner',
# CHANGE 3/5: Your project ID is required in order to run your pipeline on
# the Google Cloud Dataflow Service.
'--project=--------',
# CHANGE 4/5: Your Google Cloud Storage path is required for staging local
# files.
# '--dataset=game_dataset',
'--staging_location=gs://wordcount_project/staging',
# CHANGE 5/5: Your Google Cloud Storage path is required for temporary
# files.
'--temp_location=gs://wordcount_project/temp',
'--job_name=cleaning-jobs',
])
pipeline_options = PipelineOptions(pipeline_args)
pipeline_options.view_as(SetupOptions).save_main_session = True
with beam.Pipeline(options=pipeline_options) as p:
(p
| 'ReadInputText' >> beam.io.ReadFromText(known_args.input)
| 'ParseDataFn' >> beam.ParDo(ParseCSVFn())
# | 'JsonBirthDay' >> beam.ParDo(JsonBirthDay())
# | 'MatchNumber' >> beam.ParDo(MatchNumber('phoneNumber'))
# | 'MapData' >> beam.Map(lambda elem: (elem['phoneNumber'], elem['firstName'], elem['lastName'],
# elem['birthDate'], elem['voterId']))
| 'WriteOutputText' >> beam.io.WriteToText(known_args.output))
if __name__ == '__main__':
logging.getLogger().setLevel(logging.INFO)
run()
我还尝试安装google-gax和请求的特定包,但它似乎没有帮助
编辑:新的编码错误:
File "new_clean.py", line 226, in <module>
run()
File "new_clean.py", line 219, in run
| 'WriteToText' >> beam.io.WriteToText(known_args.output)
File "C:\Python27\lib\site-packages\apache_beam\pipeline.py", line 389, in __exit__
self.run().wait_until_finish()
File "C:\Python27\lib\site-packages\apache_beam\pipeline.py", line 369, in run
self.to_runner_api(), self.runner, self._options).run(False)
File "C:\Python27\lib\site-packages\apache_beam\pipeline.py", line 382, in run
return self.runner.run_pipeline(self)
File "C:\Python27\lib\site-packages\apache_beam\runners\dataflow\dataflow_runner.py", line 324, in run_pipeline
self.dataflow_client.create_job(self.job), self)
File "C:\Python27\lib\site-packages\apache_beam\utils\retry.py", line 180, in wrapper
return fun(*args, **kwargs)
File "C:\Python27\lib\site-packages\apache_beam\runners\dataflow\internal\apiclient.py", line 461, in create_job
self.create_job_description(job)
File "C:\Python27\lib\site-packages\apache_beam\runners\dataflow\internal\apiclient.py", line 491, in create_job_description
job.options, file_copy=self._gcs_file_copy)
File "C:\Python27\lib\site-packages\apache_beam\runners\dataflow\internal\dependency.py", line 328, in stage_job_resources
setup_options.requirements_file, requirements_cache_path)
File "C:\Python27\lib\site-packages\apache_beam\runners\dataflow\internal\dependency.py", line 262, in _populate_requirements_cache
processes.check_call(cmd_args)
File "C:\Python27\lib\site-packages\apache_beam\utils\processes.py", line 44, in check_call
return subprocess.check_call(*args, **kwargs)
File "C:\Python27\lib\subprocess.py", line 186, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['C:\\Python27\\python.exe', '-m', 'pip', 'download', '--dest', 'c:\\users\\james\\appdata\\local\\temp\\dataflow-requirements-cache', '-r', 'requirements.txt', '--no-binary', ':all:']' returned non-zero exit status 1
可能是Dataflow没有接收具有管道额外依赖性的文件。 要安装它们,你可以这样做:
pip freeze > requirements.txt
然后,您将要编辑requirements.txt
文件,只保留从PyPI安装并在管道中使用的软件包。
运行管道时,请传递以下命令行选项:
--requirements_file requirements.txt
这在Apache Beam的Python Pipeline Dependencies文档中有记录 。
希望有所帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.