无法下载 spark-nlp 库提供的管道

Question

i am unable to use the predefined pipeline "recognize_entities_dl" provided by the spark-nlp library我无法使用 spark-nlp 库提供的预定义管道“recognize_entities_dl”

i tried installing different versions of pyspark and spark-nlp library我尝试安装不同版本的 pyspark 和 spark-nlp 库

import sparknlp
from sparknlp.pretrained import PretrainedPipeline

#create or get Spark Session

spark = sparknlp.start()

sparknlp.version()
spark.version

#download, load, and annotate a text by pre-trained pipeline

pipeline = PretrainedPipeline('recognize_entities_dl', lang='en')
result = pipeline.annotate('Harry Potter is a great movie')

2.1.0
recognize_entities_dl download started this may take some time.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-13-b71a0f77e93a> in <module>
     11 #download, load, and annotate a text by pre-trained pipeline
     12 
---> 13 pipeline = PretrainedPipeline('recognize_entities_dl', 'en')
     14 result = pipeline.annotate('Harry Potter is a great movie')

d:\python36\lib\site-packages\sparknlp\pretrained.py in __init__(self, name, lang, remote_loc)
     89 
     90     def __init__(self, name, lang='en', remote_loc=None):
---> 91         self.model = ResourceDownloader().downloadPipeline(name, lang, remote_loc)
     92         self.light_model = LightPipeline(self.model)
     93 

d:\python36\lib\site-packages\sparknlp\pretrained.py in downloadPipeline(name, language, remote_loc)
     50     def downloadPipeline(name, language, remote_loc=None):
     51         print(name + " download started this may take some time.")
---> 52         file_size = _internal._GetResourceSize(name, language, remote_loc).apply()
     53         if file_size == "-1":
     54             print("Can not find the model to download please check the name!")

AttributeError: module 'sparknlp.internal' has no attribute '_GetResourceSize'

Answer 1

Thanks for confirming your Apache Spark version.感谢您确认您的 Apache Spark 版本。 The pre-trained pipelines and models are based on Apache Spark and Spark NLP versions.预训练的管道和模型基于 Apache Spark 和 Spark NLP 版本。 The lowest Apache Spark version must be 2.4.x to be able to download the pre-trained models/pipelines.最低 Apache Spark 版本必须为2.4.x才能下载预训练模型/管道。 Otherwise, you need to train your own models/pipelines for any version before.否则，您需要为之前的任何版本训练自己的模型/管道。

This is the list of all pipelines and they all for Apache Spark 2.4.x: https://nlp.johnsnowlabs.com/docs/en/pipelines这是所有管道的列表，它们都适用于 Apache Spark 2.4.x： https://nlp.johnsnowlabs.com/docs/en/pipelines

If you take a look at the URL of any models or pipelines you can see this information:如果您查看任何型号或管道的 URL 可以看到以下信息：

recognize_entities_dl_en_2.1.0_2.4_1562946909722.zip

Name : recognize_entities_dl名称： recognize_entities_dl
Lang : en朗: en
Spark NLP : must be equal to 2.1.0 or greater Spark NLP ：必须等于2.1.0或更高版本
Apache Spark : equal to 2.4.x or greater Apache Spark ：等于2.4.x或更高

NOTE: The Spark NLP library is being built and compiled against Apache Spark 2.4.x .注意：Spark NLP 库正在针对 Apache Spark 2.4.x构建和编译。 That is why models and pipelines are being only available for the 2.4.x version.这就是模型和管道仅适用于2.4.x版本的原因。

NOTE 2: Since you are using Windows, you need to use _noncontrib models and pipelines which are compatible with Windows: Do Spark-NLP pretrained pipelines only work on linux systems?注意 2：由于您使用的是 Windows，因此您需要使用与 Windows 兼容的_noncontrib模型和管道： Spark-NLP 预训练管道是否仅适用于 ZE206A54E97690CCE50CC872DD70EE896 系统？

I hope this answer helps and solves your issue.我希望这个答案可以帮助并解决您的问题。

UPDATE April 2020: Apparently the models and pipelines trained and uploaded on Apache Spark 2.4.x are compatible with Apache Spark 2.3.x as well. 2020 年 4 月更新：显然，在 Apache Spark 2.4.x 上训练和上传的模型和管道也与 Apache Spark 2.3.x 兼容。 So if you are on Apache Spark 2.3.x even though you cannot use pretrained() for auto-download you can download it manually and just use .load() instead.因此，如果您使用的是 Apache Spark 2.3.x，即使您不能使用pretrained()进行自动下载，您也可以手动下载它，然后使用.load()代替。

Full list of all models and pipelines with links to download: https://github.com/JohnSnowLabs/spark-nlp-models所有模型和管道的完整列表以及下载链接： https://github.com/JohnSnowLabs/spark-nlp-models

Update: After 2.4.0 release, all the models and pipelines are cross-platform and there is no need to choose a different model/pipeline for any specific OS: https://github.com/JohnSnowLabs/spark-nlp/releases/tag/2.4.0更新：2.4.0 发布后，所有模型和管道都是跨平台的，无需为任何特定操作系统选择不同的模型/管道： https://github.com/JohnSnowLabs/spark-nlp/releases/标签/2.4.0

For newer releases: https://github.com/JohnSnowLabs/spark-nlp/releases对于较新的版本： https://github.com/JohnSnowLabs/spark-nlp/releases

无法下载 spark-nlp 库提供的管道

问题描述

1 个解决方案

解决方案1
4 2019-11-11 11:23:04

无法下载 spark-nlp 库提供的管道

问题描述

1 个解决方案

解决方案1 4 2019-11-11 11:23:04

解决方案1
4 2019-11-11 11:23:04