[英]unable to download the pipeline provided by spark-nlp library
i am unable to use the predefined pipeline "recognize_entities_dl" provided by the spark-nlp library我无法使用 spark-nlp 库提供的预定义管道“recognize_entities_dl”
i tried installing different versions of pyspark and spark-nlp library我尝试安装不同版本的 pyspark 和 spark-nlp 库
import sparknlp
from sparknlp.pretrained import PretrainedPipeline
#create or get Spark Session
spark = sparknlp.start()
sparknlp.version()
spark.version
#download, load, and annotate a text by pre-trained pipeline
pipeline = PretrainedPipeline('recognize_entities_dl', lang='en')
result = pipeline.annotate('Harry Potter is a great movie')
2.1.0
recognize_entities_dl download started this may take some time.
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-13-b71a0f77e93a> in <module>
11 #download, load, and annotate a text by pre-trained pipeline
12
---> 13 pipeline = PretrainedPipeline('recognize_entities_dl', 'en')
14 result = pipeline.annotate('Harry Potter is a great movie')
d:\python36\lib\site-packages\sparknlp\pretrained.py in __init__(self, name, lang, remote_loc)
89
90 def __init__(self, name, lang='en', remote_loc=None):
---> 91 self.model = ResourceDownloader().downloadPipeline(name, lang, remote_loc)
92 self.light_model = LightPipeline(self.model)
93
d:\python36\lib\site-packages\sparknlp\pretrained.py in downloadPipeline(name, language, remote_loc)
50 def downloadPipeline(name, language, remote_loc=None):
51 print(name + " download started this may take some time.")
---> 52 file_size = _internal._GetResourceSize(name, language, remote_loc).apply()
53 if file_size == "-1":
54 print("Can not find the model to download please check the name!")
AttributeError: module 'sparknlp.internal' has no attribute '_GetResourceSize'
Thanks for confirming your Apache Spark version.感谢您确认您的 Apache Spark 版本。 The pre-trained pipelines and models are based on Apache Spark and Spark NLP versions.
预训练的管道和模型基于 Apache Spark 和 Spark NLP 版本。 The lowest Apache Spark version must be
2.4.x
to be able to download the pre-trained models/pipelines.最低 Apache Spark 版本必须为
2.4.x
才能下载预训练模型/管道。 Otherwise, you need to train your own models/pipelines for any version before.否则,您需要为之前的任何版本训练自己的模型/管道。
This is the list of all pipelines and they all for Apache Spark 2.4.x: https://nlp.johnsnowlabs.com/docs/en/pipelines这是所有管道的列表,它们都适用于 Apache Spark 2.4.x: https://nlp.johnsnowlabs.com/docs/en/pipelines
If you take a look at the URL of any models or pipelines you can see this information:如果您查看任何型号或管道的 URL 可以看到以下信息:
recognize_entities_dl_en_2.1.0_2.4_1562946909722.zip
recognize_entities_dl
recognize_entities_dl
en
en
2.1.0
or greater 2.1.0
或更高版本2.4.x
or greater 2.4.x
或更高NOTE: The Spark NLP library is being built and compiled against Apache Spark 2.4.x
.注意:Spark NLP 库正在针对 Apache Spark
2.4.x
构建和编译。 That is why models and pipelines are being only available for the 2.4.x
version.这就是模型和管道仅适用于
2.4.x
版本的原因。
NOTE 2: Since you are using Windows, you need to use _noncontrib
models and pipelines which are compatible with Windows: Do Spark-NLP pretrained pipelines only work on linux systems?注意 2:由于您使用的是 Windows,因此您需要使用与 Windows 兼容的
_noncontrib
模型和管道: Spark-NLP 预训练管道是否仅适用于 ZE206A54E97690CCE50CC872DD70EE896 系统?
I hope this answer helps and solves your issue.我希望这个答案可以帮助并解决您的问题。
UPDATE April 2020: Apparently the models and pipelines trained and uploaded on Apache Spark 2.4.x are compatible with Apache Spark 2.3.x as well. 2020 年 4 月更新:显然,在 Apache Spark 2.4.x 上训练和上传的模型和管道也与 Apache Spark 2.3.x 兼容。 So if you are on Apache Spark 2.3.x even though you cannot use
pretrained()
for auto-download you can download it manually and just use .load()
instead.因此,如果您使用的是 Apache Spark 2.3.x,即使您不能使用
pretrained()
进行自动下载,您也可以手动下载它,然后使用.load()
代替。
Full list of all models and pipelines with links to download: https://github.com/JohnSnowLabs/spark-nlp-models所有模型和管道的完整列表以及下载链接: https://github.com/JohnSnowLabs/spark-nlp-models
Update: After 2.4.0 release, all the models and pipelines are cross-platform and there is no need to choose a different model/pipeline for any specific OS: https://github.com/JohnSnowLabs/spark-nlp/releases/tag/2.4.0更新:2.4.0 发布后,所有模型和管道都是跨平台的,无需为任何特定操作系统选择不同的模型/管道: https://github.com/JohnSnowLabs/spark-nlp/releases/标签/2.4.0
For newer releases: https://github.com/JohnSnowLabs/spark-nlp/releases对于较新的版本: https://github.com/JohnSnowLabs/spark-nlp/releases
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.