I am trying to convert my script that uses transformers to an exe file. Its a small single file script that performs token classification:
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
# download once to save locally
# tokenizer = AutoTokenizer.from_pretrained("dslim/bert-base-NER-uncased")
# model = AutoModelForTokenClassification.from_pretrained("dslim/bert-base-NER-uncased")
# save model locally
# tokenizer.save_pretrained("./model")
# model.save_pretrained("./model")
# now just load from local file
tokenizer = AutoTokenizer.from_pretrained('./model')
model = AutoModelForTokenClassification.from_pretrained('./model')
nlp = pipeline("token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
example = """00:00:02 Speaker 1: hi john, it's nice to see you again. how was your weekend? do anything special? 00:00:06 Speaker 2: yep, all good thanks. i was with my sister in derby. We saw, you know, that james bond film. what's it called? then got a couple of drinks at the pitcher and piano, back in nottingham. """
ner_results = nlp(example)
print(ner_results)
for i in range(0, len(ner_results)):
start = ner_results[i]['start']
end = ner_results[i]['end']
example = example.replace(ner_results[i]['word'], ner_results[i]['entity_group'])
print(example)
The online models are downloaded only once and then saved locally so that they can be packaged with pyinstaller. I am using the below line to build the exe file (which from reading other similar questions on SO, adds all the required missing libraries that pyinstaller misses)
pyinstaller --windowed --add-data ./model/config.json;./model/ --add-data ./model/pytorch_model.bin;./model/ --add-data ./model/special_tokens_map.json;./model/ --add-data ./model/tokenizer.json;./model/ --add-data ./model/tokenizer_config.json;./model/ --add-data ./model/vocab.txt;./model/ --collect-data tensorflow --collect-data torch --copy-metadata torch --copy-metadata tqdm --copy-metadata regex --copy-metadata sacremoses --copy-metadata requests --copy-metadata packaging --copy-metadata filelock --copy-metadata numpy --copy-metadata tokenizers --copy-metadata importlib_metadata --hidden-import=“sklearn.utils._cython_blas” --hidden-import=“sklearn.neighbors.typedefs” --hidden-import=“sklearn.neighbors.quad_tree” --hidden-import=“sklearn.tree” --hidden-import=“sklearn.tree._utils” deidentify.py
This generates the following.spec file
# -*- mode: python ; coding: utf-8 -*-
from PyInstaller.utils.hooks import collect_data_files
from PyInstaller.utils.hooks import copy_metadata
datas = [('./model/config.json', './model/'), ('./model/pytorch_model.bin', './model/'), ('./model/special_tokens_map.json', './model/'), ('./model/tokenizer.json', './model/'), ('./model/tokenizer_config.json', './model/'), ('./model/vocab.txt', './model/')]
datas += collect_data_files('tensorflow')
datas += collect_data_files('torch')
datas += copy_metadata('torch')
datas += copy_metadata('tqdm')
datas += copy_metadata('regex')
datas += copy_metadata('sacremoses')
datas += copy_metadata('requests')
datas += copy_metadata('packaging')
datas += copy_metadata('filelock')
datas += copy_metadata('numpy')
datas += copy_metadata('tokenizers')
datas += copy_metadata('importlib_metadata')
block_cipher = None
a = Analysis(['deidentify.py'],
pathex=[],
binaries=[],
datas=datas,
hiddenimports=['“sklearn.utils._cython_blas”', '“sklearn.neighbors.typedefs”', '“sklearn.neighbors.quad_tree”', '“sklearn.tree”', '“sklearn.tree._utils”'],
hookspath=[],
hooksconfig={},
runtime_hooks=[],
excludes=[],
win_no_prefer_redirects=False,
win_private_assemblies=False,
cipher=block_cipher,
noarchive=False)
pyz = PYZ(a.pure, a.zipped_data,
cipher=block_cipher)
exe = EXE(pyz,
a.scripts,
[],
exclude_binaries=True,
name='deidentify',
debug=False,
bootloader_ignore_signals=False,
strip=False,
upx=True,
console=False,
disable_windowed_traceback=False,
target_arch=None,
codesign_identity=None,
entitlements_file=None )
coll = COLLECT(exe,
a.binaries,
a.zipfiles,
a.datas,
strip=False,
upx=True,
upx_exclude=[],
name='deidentify')
As can be seen all model files and libraries are included.
The following is the console output when generating the exe file
console output removed due to maximum character limit reached
I dont know why so many modules are missing above as I have them installed in my system and in my local environment. They should be picked up. I even asked them to be included in the.spec file.
After the process completes, the error I receive when I run the exe file is:
Traceback (most recent call last):
File "transformers\utils\versions.py", line 105, in require_version
File "importlib_metadata\__init__.py", line 631, in version
File "importlib_metadata\__init__.py", line 604, in distribution
File "importlib_metadata\__init__.py", line 229, in from_name
importlib_metadata.PackageNotFoundError: No package metadata was found for dataclasses
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "deidentify.py", line 1, in <module>
File "PyInstaller\loader\pyimod03_importers.py", line 476, in exec_module
File "transformers\__init__.py", line 43, in <module>
File "PyInstaller\loader\pyimod03_importers.py", line 476, in exec_module
File "transformers\dependency_versions_check.py", line 41, in <module>
File "transformers\utils\versions.py", line 120, in require_version_core
File "transformers\utils\versions.py", line 108, in require_version
importlib_metadata.PackageNotFoundError: No package metadata was found for The 'dataclasses' distribution was not found and is required by this application.
Try: pip install transformers -U or pip install -e '.[dev]' if you're working with git master
importlib_metadata
is installed in pip it should not be missing.
Update
After the comment by @0x26res and updating to python 3.8 I am presented with a new error:
Traceback (most recent call last):
File "torch\_sources.py", line 21, in get_source_lines_and_file
sourcelines, file_lineno = inspect.getsourcelines(obj)
File "inspect.py", line 979, in getsourcelines
File "inspect.py", line 798, in findsource
OSError: could not get source code
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "transformers\file_utils.py", line 2704, in _get_module
File "importlib\__init__.py", line 127, in import_module
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
File "PyInstaller\loader\pyimod03_importers.py", line 495, in exec_module
File "transformers\models\deberta\modeling_deberta.py", line 505, in <module>
File "torch\jit\_script.py", line 1307, in script
ast = get_jit_def(obj, obj.__name__)
File "torch\jit\frontend.py", line 233, in get_jit_def
parsed_def = parse_def(fn)
File "torch\_sources.py", line 95, in parse_def
sourcelines, file_lineno, filename = get_source_lines_and_file(fn, ErrorReport.call_stack())
File "torch\_sources.py", line 28, in get_source_lines_and_file
raise OSError(msg) from e
OSError: Can't get source for <function c2p_dynamic_expand at 0x000002608019EDC0>. TorchScript requires source access in order to carry out compilation, make sure original .py files are available.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "deidentify.py", line 16, in <module>
File "transformers\pipelines\__init__.py", line 651, in pipeline
File "transformers\pipelines\token_classification.py", line 103, in __init__
File "transformers\pipelines\base.py", line 853, in check_model_type
File "transformers\models\auto\auto_factory.py", line 601, in items
File "transformers\models\auto\auto_factory.py", line 604, in <listcomp>
File "transformers\models\auto\auto_factory.py", line 573, in _load_attr_from_module
File "transformers\models\auto\auto_factory.py", line 535, in getattribute_from_module
File "transformers\file_utils.py", line 2694, in __getattr__
File "transformers\file_utils.py", line 2706, in _get_module
RuntimeError: Failed to import transformers.models.deberta.modeling_deberta because of the following error (look up to see its traceback):
Can't get source for <function c2p_dynamic_expand at 0x000002608019EDC0>. TorchScript requires source access in order to carry out compilation, make sure original .py files are available.
I gave the following command after updating to python 3.8
pyinstaller --windowed --add-data ./model/;./model/ --collect-data torch --copy-metadata torch --copy-metadata tqd
m --copy-metadata regex --copy-metadata sacremoses --copy-metadata requests --copy-metadata packaging --copy-metadata filelock --copy-metadata numpy --copy-metadata tokenizers deidentify.p
y
First of all, you don't need to individually include all the files in ./model/
, just include the entire model directory and everything else will be included too:
datas=[('model/','model'),...
I don't know why dataclasses
isn't being included, but just try including it manually
datas=[('[path-to-your-dataclasses.py]', '.'),...
This will put dataclasses.py in the root directory and it should be found by the exe.
obviously, there are many different ways to convert your script with its design and stuff to an executable file .exe
. One of the best choices that I made recently is auto-py-to-exe
. it's pretty much easy to use: here is the steps I made for conversion:
conda activate <NAME_OF_ENV>
auto-py-to-exe
package using pip install auto-py-to-exe
auto-py-to-exe
application by entering auto-py-to-exe
in the activated environment.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.