Cannot convert python script to exe using pyinstaller due to package not found error

Question

I am trying to convert my script that uses transformers to an exe file. Its a small single file script that performs token classification:

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

# download once to save locally
# tokenizer = AutoTokenizer.from_pretrained("dslim/bert-base-NER-uncased")
# model = AutoModelForTokenClassification.from_pretrained("dslim/bert-base-NER-uncased")

# save model locally
# tokenizer.save_pretrained("./model")
# model.save_pretrained("./model")

# now just load from local file
tokenizer = AutoTokenizer.from_pretrained('./model')
model = AutoModelForTokenClassification.from_pretrained('./model')

nlp = pipeline("token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="simple")

example = """00:00:02 Speaker 1: hi john, it's nice to see you again. how was your weekend? do anything special? 00:00:06 Speaker 2: yep, all good thanks. i was with my sister in derby. We saw, you know, that james bond film. what's it called? then got a couple of drinks at the pitcher and piano, back in nottingham. """
ner_results = nlp(example)
print(ner_results)


for i in range(0, len(ner_results)):
  start = ner_results[i]['start']
  end = ner_results[i]['end']
  example = example.replace(ner_results[i]['word'], ner_results[i]['entity_group'])
print(example)

The online models are downloaded only once and then saved locally so that they can be packaged with pyinstaller. I am using the below line to build the exe file (which from reading other similar questions on SO, adds all the required missing libraries that pyinstaller misses)

pyinstaller --windowed --add-data ./model/config.json;./model/ --add-data ./model/pytorch_model.bin;./model/ --add-data ./model/special_tokens_map.json;./model/ --add-data ./model/tokenizer.json;./model/ --add-data ./model/tokenizer_config.json;./model/ --add-data ./model/vocab.txt;./model/ --collect-data tensorflow --collect-data torch --copy-metadata torch --copy-metadata tqdm --copy-metadata regex --copy-metadata sacremoses --copy-metadata requests --copy-metadata packaging --copy-metadata filelock --copy-metadata numpy --copy-metadata tokenizers --copy-metadata importlib_metadata --hidden-import=“sklearn.utils._cython_blas” --hidden-import=“sklearn.neighbors.typedefs” --hidden-import=“sklearn.neighbors.quad_tree” --hidden-import=“sklearn.tree” --hidden-import=“sklearn.tree._utils” deidentify.py

This generates the following.spec file

# -*- mode: python ; coding: utf-8 -*-
from PyInstaller.utils.hooks import collect_data_files
from PyInstaller.utils.hooks import copy_metadata

datas = [('./model/config.json', './model/'), ('./model/pytorch_model.bin', './model/'), ('./model/special_tokens_map.json', './model/'), ('./model/tokenizer.json', './model/'), ('./model/tokenizer_config.json', './model/'), ('./model/vocab.txt', './model/')]
datas += collect_data_files('tensorflow')
datas += collect_data_files('torch')
datas += copy_metadata('torch')
datas += copy_metadata('tqdm')
datas += copy_metadata('regex')
datas += copy_metadata('sacremoses')
datas += copy_metadata('requests')
datas += copy_metadata('packaging')
datas += copy_metadata('filelock')
datas += copy_metadata('numpy')
datas += copy_metadata('tokenizers')
datas += copy_metadata('importlib_metadata')


block_cipher = None


a = Analysis(['deidentify.py'],
             pathex=[],
             binaries=[],
             datas=datas,
             hiddenimports=['“sklearn.utils._cython_blas”', '“sklearn.neighbors.typedefs”', '“sklearn.neighbors.quad_tree”', '“sklearn.tree”', '“sklearn.tree._utils”'],
             hookspath=[],
             hooksconfig={},
             runtime_hooks=[],
             excludes=[],
             win_no_prefer_redirects=False,
             win_private_assemblies=False,
             cipher=block_cipher,
             noarchive=False)
pyz = PYZ(a.pure, a.zipped_data,
             cipher=block_cipher)

exe = EXE(pyz,
          a.scripts, 
          [],
          exclude_binaries=True,
          name='deidentify',
          debug=False,
          bootloader_ignore_signals=False,
          strip=False,
          upx=True,
          console=False,
          disable_windowed_traceback=False,
          target_arch=None,
          codesign_identity=None,
          entitlements_file=None )
coll = COLLECT(exe,
               a.binaries,
               a.zipfiles,
               a.datas, 
               strip=False,
               upx=True,
               upx_exclude=[],
               name='deidentify')

As can be seen all model files and libraries are included.

The following is the console output when generating the exe file

console output removed due to maximum character limit reached

I dont know why so many modules are missing above as I have them installed in my system and in my local environment. They should be picked up. I even asked them to be included in the.spec file.

After the process completes, the error I receive when I run the exe file is:

Traceback (most recent call last):
  File "transformers\utils\versions.py", line 105, in require_version
  File "importlib_metadata\__init__.py", line 631, in version
  File "importlib_metadata\__init__.py", line 604, in distribution
  File "importlib_metadata\__init__.py", line 229, in from_name
importlib_metadata.PackageNotFoundError: No package metadata was found for dataclasses

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "deidentify.py", line 1, in <module>
  File "PyInstaller\loader\pyimod03_importers.py", line 476, in exec_module
  File "transformers\__init__.py", line 43, in <module>
  File "PyInstaller\loader\pyimod03_importers.py", line 476, in exec_module
  File "transformers\dependency_versions_check.py", line 41, in <module>
  File "transformers\utils\versions.py", line 120, in require_version_core
  File "transformers\utils\versions.py", line 108, in require_version
importlib_metadata.PackageNotFoundError: No package metadata was found for The 'dataclasses' distribution was not found and is required by this application. 
Try: pip install transformers -U or pip install -e '.[dev]' if you're working with git master

importlib_metadata is installed in pip it should not be missing.

Update

After the comment by @0x26res and updating to python 3.8 I am presented with a new error:

Traceback (most recent call last):
  File "torch\_sources.py", line 21, in get_source_lines_and_file
    sourcelines, file_lineno = inspect.getsourcelines(obj)
  File "inspect.py", line 979, in getsourcelines
  File "inspect.py", line 798, in findsource
OSError: could not get source code

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "transformers\file_utils.py", line 2704, in _get_module
  File "importlib\__init__.py", line 127, in import_module
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "PyInstaller\loader\pyimod03_importers.py", line 495, in exec_module
  File "transformers\models\deberta\modeling_deberta.py", line 505, in <module>
  File "torch\jit\_script.py", line 1307, in script
    ast = get_jit_def(obj, obj.__name__)
  File "torch\jit\frontend.py", line 233, in get_jit_def
    parsed_def = parse_def(fn)
  File "torch\_sources.py", line 95, in parse_def
    sourcelines, file_lineno, filename = get_source_lines_and_file(fn, ErrorReport.call_stack())
  File "torch\_sources.py", line 28, in get_source_lines_and_file
    raise OSError(msg) from e
OSError: Can't get source for <function c2p_dynamic_expand at 0x000002608019EDC0>. TorchScript requires source access in order to carry out compilation, make sure original .py files are available.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "deidentify.py", line 16, in <module>
  File "transformers\pipelines\__init__.py", line 651, in pipeline
  File "transformers\pipelines\token_classification.py", line 103, in __init__
  File "transformers\pipelines\base.py", line 853, in check_model_type
  File "transformers\models\auto\auto_factory.py", line 601, in items
  File "transformers\models\auto\auto_factory.py", line 604, in <listcomp>
  File "transformers\models\auto\auto_factory.py", line 573, in _load_attr_from_module
  File "transformers\models\auto\auto_factory.py", line 535, in getattribute_from_module
  File "transformers\file_utils.py", line 2694, in __getattr__
  File "transformers\file_utils.py", line 2706, in _get_module
RuntimeError: Failed to import transformers.models.deberta.modeling_deberta because of the following error (look up to see its traceback):
Can't get source for <function c2p_dynamic_expand at 0x000002608019EDC0>. TorchScript requires source access in order to carry out compilation, make sure original .py files are available.

I gave the following command after updating to python 3.8

pyinstaller --windowed --add-data ./model/;./model/ --collect-data torch --copy-metadata torch --copy-metadata tqd
m --copy-metadata regex --copy-metadata sacremoses --copy-metadata requests --copy-metadata packaging --copy-metadata filelock --copy-metadata numpy --copy-metadata tokenizers deidentify.p
y

Answer 1

First of all, you don't need to individually include all the files in ./model/ , just include the entire model directory and everything else will be included too:

datas=[('model/','model'),...

I don't know why dataclasses isn't being included, but just try including it manually

datas=[('[path-to-your-dataclasses.py]', '.'),...

This will put dataclasses.py in the root directory and it should be found by the exe.

Answer 2

obviously, there are many different ways to convert your script with its design and stuff to an executable file .exe . One of the best choices that I made recently is auto-py-to-exe . it's pretty much easy to use: here is the steps I made for conversion:

activating conda env using conda activate <NAME_OF_ENV>
installing auto-py-to-exe package using pip install auto-py-to-exe
run auto-py-to-exe application by entering auto-py-to-exe in the activated environment.
follow the steps mentioned in official GitHub( here )

Cannot convert python script to exe using pyinstaller due to package not found error

Question

2 answers

solution1
1 2022-02-11 17:11:15

solution2
1 2022-02-14 14:36:05

Cannot convert python script to exe using pyinstaller due to package not found error

Question

2 answers

solution1 1 2022-02-11 17:11:15

solution2 1 2022-02-14 14:36:05

solution1
1 2022-02-11 17:11:15

solution2
1 2022-02-14 14:36:05