简体   繁体   中英

Cannot convert python script to exe using pyinstaller due to package not found error

I am trying to convert my script that uses transformers to an exe file. Its a small single file script that performs token classification:

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

# download once to save locally
# tokenizer = AutoTokenizer.from_pretrained("dslim/bert-base-NER-uncased")
# model = AutoModelForTokenClassification.from_pretrained("dslim/bert-base-NER-uncased")

# save model locally
# tokenizer.save_pretrained("./model")
# model.save_pretrained("./model")

# now just load from local file
tokenizer = AutoTokenizer.from_pretrained('./model')
model = AutoModelForTokenClassification.from_pretrained('./model')

nlp = pipeline("token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="simple")

example = """00:00:02 Speaker 1: hi john, it's nice to see you again. how was your weekend? do anything special? 00:00:06 Speaker 2: yep, all good thanks. i was with my sister in derby. We saw, you know, that james bond film. what's it called? then got a couple of drinks at the pitcher and piano, back in nottingham. """
ner_results = nlp(example)
print(ner_results)


for i in range(0, len(ner_results)):
  start = ner_results[i]['start']
  end = ner_results[i]['end']
  example = example.replace(ner_results[i]['word'], ner_results[i]['entity_group'])
print(example)

The online models are downloaded only once and then saved locally so that they can be packaged with pyinstaller. I am using the below line to build the exe file (which from reading other similar questions on SO, adds all the required missing libraries that pyinstaller misses)

pyinstaller --windowed --add-data ./model/config.json;./model/ --add-data ./model/pytorch_model.bin;./model/ --add-data ./model/special_tokens_map.json;./model/ --add-data ./model/tokenizer.json;./model/ --add-data ./model/tokenizer_config.json;./model/ --add-data ./model/vocab.txt;./model/ --collect-data tensorflow --collect-data torch --copy-metadata torch --copy-metadata tqdm --copy-metadata regex --copy-metadata sacremoses --copy-metadata requests --copy-metadata packaging --copy-metadata filelock --copy-metadata numpy --copy-metadata tokenizers --copy-metadata importlib_metadata --hidden-import=“sklearn.utils._cython_blas” --hidden-import=“sklearn.neighbors.typedefs” --hidden-import=“sklearn.neighbors.quad_tree” --hidden-import=“sklearn.tree” --hidden-import=“sklearn.tree._utils” deidentify.py

This generates the following.spec file

# -*- mode: python ; coding: utf-8 -*-
from PyInstaller.utils.hooks import collect_data_files
from PyInstaller.utils.hooks import copy_metadata

datas = [('./model/config.json', './model/'), ('./model/pytorch_model.bin', './model/'), ('./model/special_tokens_map.json', './model/'), ('./model/tokenizer.json', './model/'), ('./model/tokenizer_config.json', './model/'), ('./model/vocab.txt', './model/')]
datas += collect_data_files('tensorflow')
datas += collect_data_files('torch')
datas += copy_metadata('torch')
datas += copy_metadata('tqdm')
datas += copy_metadata('regex')
datas += copy_metadata('sacremoses')
datas += copy_metadata('requests')
datas += copy_metadata('packaging')
datas += copy_metadata('filelock')
datas += copy_metadata('numpy')
datas += copy_metadata('tokenizers')
datas += copy_metadata('importlib_metadata')


block_cipher = None


a = Analysis(['deidentify.py'],
             pathex=[],
             binaries=[],
             datas=datas,
             hiddenimports=['“sklearn.utils._cython_blas”', '“sklearn.neighbors.typedefs”', '“sklearn.neighbors.quad_tree”', '“sklearn.tree”', '“sklearn.tree._utils”'],
             hookspath=[],
             hooksconfig={},
             runtime_hooks=[],
             excludes=[],
             win_no_prefer_redirects=False,
             win_private_assemblies=False,
             cipher=block_cipher,
             noarchive=False)
pyz = PYZ(a.pure, a.zipped_data,
             cipher=block_cipher)

exe = EXE(pyz,
          a.scripts, 
          [],
          exclude_binaries=True,
          name='deidentify',
          debug=False,
          bootloader_ignore_signals=False,
          strip=False,
          upx=True,
          console=False,
          disable_windowed_traceback=False,
          target_arch=None,
          codesign_identity=None,
          entitlements_file=None )
coll = COLLECT(exe,
               a.binaries,
               a.zipfiles,
               a.datas, 
               strip=False,
               upx=True,
               upx_exclude=[],
               name='deidentify')

As can be seen all model files and libraries are included.

The following is the console output when generating the exe file

console output removed due to maximum character limit reached

I dont know why so many modules are missing above as I have them installed in my system and in my local environment. They should be picked up. I even asked them to be included in the.spec file.

After the process completes, the error I receive when I run the exe file is:

Traceback (most recent call last):
  File "transformers\utils\versions.py", line 105, in require_version
  File "importlib_metadata\__init__.py", line 631, in version
  File "importlib_metadata\__init__.py", line 604, in distribution
  File "importlib_metadata\__init__.py", line 229, in from_name
importlib_metadata.PackageNotFoundError: No package metadata was found for dataclasses

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "deidentify.py", line 1, in <module>
  File "PyInstaller\loader\pyimod03_importers.py", line 476, in exec_module
  File "transformers\__init__.py", line 43, in <module>
  File "PyInstaller\loader\pyimod03_importers.py", line 476, in exec_module
  File "transformers\dependency_versions_check.py", line 41, in <module>
  File "transformers\utils\versions.py", line 120, in require_version_core
  File "transformers\utils\versions.py", line 108, in require_version
importlib_metadata.PackageNotFoundError: No package metadata was found for The 'dataclasses' distribution was not found and is required by this application. 
Try: pip install transformers -U or pip install -e '.[dev]' if you're working with git master

importlib_metadata is installed in pip it should not be missing.

Update

After the comment by @0x26res and updating to python 3.8 I am presented with a new error:

Traceback (most recent call last):
  File "torch\_sources.py", line 21, in get_source_lines_and_file
    sourcelines, file_lineno = inspect.getsourcelines(obj)
  File "inspect.py", line 979, in getsourcelines
  File "inspect.py", line 798, in findsource
OSError: could not get source code

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "transformers\file_utils.py", line 2704, in _get_module
  File "importlib\__init__.py", line 127, in import_module
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "PyInstaller\loader\pyimod03_importers.py", line 495, in exec_module
  File "transformers\models\deberta\modeling_deberta.py", line 505, in <module>
  File "torch\jit\_script.py", line 1307, in script
    ast = get_jit_def(obj, obj.__name__)
  File "torch\jit\frontend.py", line 233, in get_jit_def
    parsed_def = parse_def(fn)
  File "torch\_sources.py", line 95, in parse_def
    sourcelines, file_lineno, filename = get_source_lines_and_file(fn, ErrorReport.call_stack())
  File "torch\_sources.py", line 28, in get_source_lines_and_file
    raise OSError(msg) from e
OSError: Can't get source for <function c2p_dynamic_expand at 0x000002608019EDC0>. TorchScript requires source access in order to carry out compilation, make sure original .py files are available.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "deidentify.py", line 16, in <module>
  File "transformers\pipelines\__init__.py", line 651, in pipeline
  File "transformers\pipelines\token_classification.py", line 103, in __init__
  File "transformers\pipelines\base.py", line 853, in check_model_type
  File "transformers\models\auto\auto_factory.py", line 601, in items
  File "transformers\models\auto\auto_factory.py", line 604, in <listcomp>
  File "transformers\models\auto\auto_factory.py", line 573, in _load_attr_from_module
  File "transformers\models\auto\auto_factory.py", line 535, in getattribute_from_module
  File "transformers\file_utils.py", line 2694, in __getattr__
  File "transformers\file_utils.py", line 2706, in _get_module
RuntimeError: Failed to import transformers.models.deberta.modeling_deberta because of the following error (look up to see its traceback):
Can't get source for <function c2p_dynamic_expand at 0x000002608019EDC0>. TorchScript requires source access in order to carry out compilation, make sure original .py files are available.

I gave the following command after updating to python 3.8

pyinstaller --windowed --add-data ./model/;./model/ --collect-data torch --copy-metadata torch --copy-metadata tqd
m --copy-metadata regex --copy-metadata sacremoses --copy-metadata requests --copy-metadata packaging --copy-metadata filelock --copy-metadata numpy --copy-metadata tokenizers deidentify.p
y

First of all, you don't need to individually include all the files in ./model/ , just include the entire model directory and everything else will be included too:

datas=[('model/','model'),...

I don't know why dataclasses isn't being included, but just try including it manually

datas=[('[path-to-your-dataclasses.py]', '.'),...

This will put dataclasses.py in the root directory and it should be found by the exe.

obviously, there are many different ways to convert your script with its design and stuff to an executable file .exe . One of the best choices that I made recently is auto-py-to-exe . it's pretty much easy to use: here is the steps I made for conversion:

  1. activating conda env using conda activate <NAME_OF_ENV>
  2. installing auto-py-to-exe package using pip install auto-py-to-exe
  3. run auto-py-to-exe application by entering auto-py-to-exe in the activated environment.
  4. follow the steps mentioned in official GitHub( here )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM