简体   繁体   English

无法下载 spark-nlp 库提供的管道

[英]unable to download the pipeline provided by spark-nlp library

i am unable to use the predefined pipeline "recognize_entities_dl" provided by the spark-nlp library我无法使用 spark-nlp 库提供的预定义管道“recognize_entities_dl”

i tried installing different versions of pyspark and spark-nlp library我尝试安装不同版本的 pyspark 和 spark-nlp 库

import sparknlp
from sparknlp.pretrained import PretrainedPipeline

#create or get Spark Session

spark = sparknlp.start()

sparknlp.version()
spark.version

#download, load, and annotate a text by pre-trained pipeline

pipeline = PretrainedPipeline('recognize_entities_dl', lang='en')
result = pipeline.annotate('Harry Potter is a great movie')

2.1.0
recognize_entities_dl download started this may take some time.
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-13-b71a0f77e93a> in <module>
     11 #download, load, and annotate a text by pre-trained pipeline
     12 
---> 13 pipeline = PretrainedPipeline('recognize_entities_dl', 'en')
     14 result = pipeline.annotate('Harry Potter is a great movie')

d:\python36\lib\site-packages\sparknlp\pretrained.py in __init__(self, name, lang, remote_loc)
     89 
     90     def __init__(self, name, lang='en', remote_loc=None):
---> 91         self.model = ResourceDownloader().downloadPipeline(name, lang, remote_loc)
     92         self.light_model = LightPipeline(self.model)
     93 

d:\python36\lib\site-packages\sparknlp\pretrained.py in downloadPipeline(name, language, remote_loc)
     50     def downloadPipeline(name, language, remote_loc=None):
     51         print(name + " download started this may take some time.")
---> 52         file_size = _internal._GetResourceSize(name, language, remote_loc).apply()
     53         if file_size == "-1":
     54             print("Can not find the model to download please check the name!")

AttributeError: module 'sparknlp.internal' has no attribute '_GetResourceSize'

Thanks for confirming your Apache Spark version.感谢您确认您的 Apache Spark 版本。 The pre-trained pipelines and models are based on Apache Spark and Spark NLP versions.预训练的管道和模型基于 Apache Spark 和 Spark NLP 版本。 The lowest Apache Spark version must be 2.4.x to be able to download the pre-trained models/pipelines.最低 Apache Spark 版本必须为2.4.x才能下载预训练模型/管道。 Otherwise, you need to train your own models/pipelines for any version before.否则,您需要为之前的任何版本训练自己的模型/管道。

This is the list of all pipelines and they all for Apache Spark 2.4.x: https://nlp.johnsnowlabs.com/docs/en/pipelines这是所有管道的列表,它们都适用于 Apache Spark 2.4.x: https://nlp.johnsnowlabs.com/docs/en/pipelines

If you take a look at the URL of any models or pipelines you can see this information:如果您查看任何型号或管道的 URL 可以看到以下信息:

recognize_entities_dl_en_2.1.0_2.4_1562946909722.zip

  • Name : recognize_entities_dl名称recognize_entities_dl
  • Lang : en: en
  • Spark NLP : must be equal to 2.1.0 or greater Spark NLP :必须等于2.1.0或更高版本
  • Apache Spark : equal to 2.4.x or greater Apache Spark :等于2.4.x或更高

NOTE: The Spark NLP library is being built and compiled against Apache Spark 2.4.x .注意:Spark NLP 库正在针对 Apache Spark 2.4.x构建和编译。 That is why models and pipelines are being only available for the 2.4.x version.这就是模型和管道仅适用于2.4.x版本的原因。

NOTE 2: Since you are using Windows, you need to use _noncontrib models and pipelines which are compatible with Windows: Do Spark-NLP pretrained pipelines only work on linux systems?注意 2:由于您使用的是 Windows,因此您需要使用与 Windows 兼容的_noncontrib模型和管道: Spark-NLP 预训练管道是否仅适用于 ZE206A54E97690CCE50CC872DD70EE896 系统?

I hope this answer helps and solves your issue.我希望这个答案可以帮助并解决您的问题。

UPDATE April 2020: Apparently the models and pipelines trained and uploaded on Apache Spark 2.4.x are compatible with Apache Spark 2.3.x as well. 2020 年 4 月更新:显然,在 Apache Spark 2.4.x 上训练和上传的模型和管道也与 Apache Spark 2.3.x 兼容。 So if you are on Apache Spark 2.3.x even though you cannot use pretrained() for auto-download you can download it manually and just use .load() instead.因此,如果您使用的是 Apache Spark 2.3.x,即使您不能使用pretrained()进行自动下载,您也可以手动下载它,然后使用.load()代替。

Full list of all models and pipelines with links to download: https://github.com/JohnSnowLabs/spark-nlp-models所有模型和管道的完整列表以及下载链接: https://github.com/JohnSnowLabs/spark-nlp-models

Update: After 2.4.0 release, all the models and pipelines are cross-platform and there is no need to choose a different model/pipeline for any specific OS: https://github.com/JohnSnowLabs/spark-nlp/releases/tag/2.4.0更新:2.4.0 发布后,所有模型和管道都是跨平台的,无需为任何特定操作系统选择不同的模型/管道: https://github.com/JohnSnowLabs/spark-nlp/releases/标签/2.4.0

For newer releases: https://github.com/JohnSnowLabs/spark-nlp/releases对于较新的版本: https://github.com/JohnSnowLabs/spark-nlp/releases

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM