简体   繁体   中英

how to use ktrain for NER Offline?

I have trained my English model following this notebook ( https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/tutorials/tutorial-06-sequence-tagging.ipynb ). I am able to save my pretrained model and run it with no problem.

However, I need to run it again but OFFLINE and it is not working, I understand that I need to download the file and do something similar to what is done here.

https://github.com/huggingface/transformers/issues/136

However, I am not able to understand where do I need to change the settings of ktrain.

I run this:

ktrain.load_predictor('Functions/my_english_nermodel')

and this is the error I get:

Traceback (most recent call last):
  File "Z:\Functions\NER.py", line 155, in load_bert
    reloaded_predictor= ktrain.load_predictor('Z:/Functions/my_english_nermodel')
  File "C:\Program Files\Python37\lib\site-packages\ktrain\core.py", line 1316, in load_predictor
    preproc = pickle.load(f)
  File "C:\Program Files\Python37\lib\site-packages\ktrain\text\ner\anago\preprocessing.py", line 76, in __setstate__
    if self.te_model is not None: self.activate_transformer(self.te_model, layers=self.te_layers)
  File "C:\Program Files\Python37\lib\site-packages\ktrain\text\ner\anago\preprocessing.py", line 100, in activate_transformer
    self.te = TransformerEmbedding(model_name, layers=layers)
  File "C:\Program Files\Python37\lib\site-packages\ktrain\text\preprocessor.py", line 1095, in __init__
    self.tokenizer = self.tokenizer_type.from_pretrained(model_name)
  File "C:\Program Files\Python37\lib\site-packages\transformers\tokenization_utils.py", line 903, in from_pretrained
    return cls._from_pretrained(*inputs, **kwargs)
  File "C:\Program Files\Python37\lib\site-packages\transformers\tokenization_utils.py", line 1008, in _from_pretrained
    list(cls.vocab_files_names.values()),
OSError: Model name 'bert-base-uncased' was not found in tokenizers model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, bert-base-finnish-cased-v1, bert-base-finnish-uncased-v1, bert-base-dutch-cased). We assumed 'bert-base-dutch-cased' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.

Process finished with exit code 1

More generally, the transformers -based pretrained models are downloaded to <home_directory>/.cache/torch/transformers . For instance, on Linux, this will be /home/<user_name>/.cache/torch/transformers .

As indicated in the answer above, to reload the ktrain predictor on a machine with no internet access (for ktrain models that utilize models from transformers library), you'll need copy the model files in that folder to the same location on the new machine.

I found a solution when ktrain is run with an internet connection it creates a folder: ''' C:\Users\lemolina.cache\torch\transformers ''' I needed to copy the same folder in the machine that does not have access to the internet

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM