简体   繁体   English

Python 导入和文件嵌入

[英]Python imports and file embebed

Im working on a project that imports several packages and when the script runs, I load a neural net model.我正在处理一个导入多个包的项目,当脚本运行时,我加载了一个神经网络模型。

I want to know if the following is achievable:我想知道以下是否可以实现:

  1. If i run the script in another python environment, i need to install all the packages im importing.如果我在另一个 python 环境中运行脚本,我需要安装我导入的所有包。 Is it possible to avoid this?有没有可能避免这种情况? This will remove the need to install all the packages the first time.这将消除第一次安装所有软件包的需要。
  2. Is it possible to embed the neuralnet .pb into the code?是否可以将神经网络 .pb 嵌入到代码中? Keep in mind that it weighs 80mb, so an hex dump doesnt work (text file with the dump weighs 700 mb)请记住,它重 80 mb,因此十六进制转储不起作用(带有转储的文本文件重 700 mb)

The idea is to have 1 .py with everything necessary within.我们的想法是拥有 1 个 .py,其中包含所有必要的内容。 Is it possible?是否可以?

Thank you!谢谢!

If i run the script in another python environment, i need to install all the packages im importing.如果我在另一个 python 环境中运行脚本,我需要安装我导入的所有包。 Is it possible to avoid this?有没有可能避免这种情况?

Well, not really but kinda ( TL;DR no, but depends on exactly what you mean ).嗯,不是真的,而是有点( TL;DR 不是,但取决于你的意思)。 It really just boils down to being a limitation of the environment.它实际上只是归结为环境的限制。 Somewhere, someplace, you need the packages where you can grab them from disk -- it's as simple as that.在某个地方,某个地方,您需要可以从磁盘中获取它们的包——就这么简单。 They have to be available and locatable .它们必须可用且可定位

By available, I mean accessible by means of the filesystem.可用,我的意思是可以通过文件系统访问。 By locatable, I mean there has to somewhere you are looking.通过可定位,我的意思是必须有你正在寻找的地方 A system install would place it somewhere that would be accessible, and could be reliably used as a place to install, and look for, packages.系统安装会将其放置在可访问的地方,并且可以可靠地用作安装和查找包的地方。 This is part of the responsibility of your virtual environment.这是您的虚拟环境责任的一部分。 The only difference is, your virtual environment is there to separate you from your system Python's packages.唯一的区别是,您的虚拟环境将您与系统 Python 的包分开。

The advantage of this is straight forward: I can create a virtual environment that uses the package slamjam==1.2.3 , where the 1.2.3 is a specific version of the package slamjam , and also run a program that uses slamjam==1.7.9 without causing a conflict in my global environment.这样做的好处是直接的:我可以创建一个使用包slamjam==1.2.3的虚拟环境,其中1.2.3是包slamjam的特定版本,并且还运行一个使用slamjam==1.7.9不会在我的全球环境中引起冲突。

So here's why I give the "kinda" vibe: if your user already has a package on your system, then your user needs to install nothing.所以这就是我给出“有点”氛围的原因:如果您的用户已经在您的系统上安装了一个软件包,那么您的用户不需要安装任何东西。 They don't need a virtual environment for that package if it's already globally installed on their system.如果该软件包已全局安装在他们的系统上,则他们不需要该软件包的虚拟环境。 Likewise, they don't need a new one if it's in another virtual environment, although it is a great idea to separate your projects dependencies with one.同样,如果它在另一个虚拟环境中,他们不需要新的,尽管将项目依赖项与一个分开是个好主意。

Is it possible to embed the neuralnet .pb into the code?是否可以将神经网络 .pb 嵌入到代码中? Keep in mind that it weighs 80mb, so an hex dump doesnt work (text file with the dump weighs 700 mb)请记住,它重 80 mb,因此十六进制转储不起作用(带有转储的文本文件重 700 mb)

So, yeah, actually it's extremely doable.所以,是的,实际上这是非常可行的。 The thing is, it depends on how you mean.问题是,这取决于你的意思。

Like you are aware, a hex dump of the file takes a lot of space.如您所知,文件的十六进制转储会占用大量空间。 That's very true.这是非常正确的。 But it seems that you are talking about raw hex, which for every byte takes 2 bytes at minimum.但似乎您在谈论原始十六进制,每个字节至少需要 2 个字节。 Then, you might be dumping out extra information with that if you used a tool like hexdump , yada, yada yada.然后,如果您使用像hexdump 、 yada 、 yada yada 这样的工具,您可能会倾倒额外的信息。

Moral of the story, you're going to waste a lot of space doing that.这个故事的寓意是,这样做会浪费很多空间。 So I'll give you a couple options, of which you can choose one, or more.所以我会给你几个选项,你可以选择一个或多个。

  1. Compress your data, even more, if it is possible.如果可能,压缩您的数据,甚至更多。

I haven't worked with TensorFlow data, but after a quick read, it appears it uses compression with ProtoBufs, and it's probably pretty compressed already.我没有使用过 TensorFlow 数据,但是在快速阅读之后,它似乎使用了 ProtoBufs 的压缩,并且它可能已经被压缩得很厉害了。 Well, whatever, go ahead and see if you can squeeze any more juice out of the fruit.好吧,不管怎样,去看看你能不能从水果中榨出更多的汁液。

  1. Take binary data, and dump it into a different encoding (hint, hint: base64 !)获取二进制数据,并将其转储为不同的编码(提示,提示: base64 !)

Watch what happens when we convert something to hex...看看当我们将某些东西转换为十六进制时会发生什么......

>>> binary_data=b'this is a readable string, but really it just boils down to binary information. i can be expressed in a more efficient way than a binary string or hex, however'
>>> hex_data = binary_data.hex()
>>> print(hex_data)
746869732069732061207265616461626c6520737472696e672c20627574207265616c6c79206974206a75737420626f696c7320646f776e20746f2062696e61727920696e666f726d6174696f6e2e20692063616e2062652065787072657373656420696e2061206d6f726520656666696369656e7420776179207468616e20612062696e61727920737472696e67206f72206865782c20686f7765766572
>>> print(len(hex_data))
318

318 characters? 318个字符? We can do better.我们可以做得更好。

>>> import base64
>>> hex_data = binary_data.hex()
>>> import base64
>>> b64_data = base64.b64encode(binary_data)
>>> print(b64_data)
b'dGhpcyBpcyBhIHJlYWRhYmxlIHN0cmluZywgYnV0IHJlYWxseSBpdCBqdXN0IGJvaWxzIGRvd24gdG8gYmluYXJ5IGluZm9ybWF0aW9uLiBpIGNhbiBiZSBleHByZXNzZWQgaW4gYSBtb3JlIGVmZmljaWVudCB3YXkgdGhhbiBhIGJpbmFyeSBzdHJpbmcgb3IgaGV4LCBob3dldmVy'
>>> print(len(b64_data))
212

You've now made your data smaller, by 33%!您现在已经将数据缩小了 33%!

  1. Package a non-Python file with your .whl distribution.使用.whl发行版打包非 Python 文件。 Yeah, totally doable.是的,完全可以。 Have I done it before?我以前做过吗? Nope, never needed to yet.不,从来不需要。 Will I ever?我会永远吗? Yep.是的。 Do I have great advice on how to do it?我对如何去做有很好的建议吗? No. But I have a link for you, it's totally doable.不,但我有一个链接给你, 这是完全可行的。

  2. You can download the file from within the application and only provide the URL.您可以从应用程序中下载文件,并且只提供 URL。 Something quick and easy, like一些快速简单的东西,比如

import wget

file_contents_in_memory = wget.download('some.site.com/a_file`)

Yeah, sure there are other libraries like requests which do similar things, but for the example, I chose wget because it's got a simple interface too, and is always an option.是的,当然还有其他类似请求的库可以做类似的事情,但是例如,我选择wget是因为它也有一个简单的界面,并且始终是一个选项。

The idea is to have 1 .py with everything necessary within.我们的想法是拥有 1 个 .py,其中包含所有必要的内容。 Is it possible?是否可以?

Well, file, yeah.嗯,文件,是的。 For what you're asking -- a .py file with nothing else that will install your packages?对于您要问的问题 - 一个没有其他东西可以安装您的软件包的.py文件? If you really want to copy and paste library after library and all the data into one massive file nobody will download, I'm sure there's a way.如果您真的想将库和所有数据复制和粘贴到一个没人会下载的海量文件中,我相信有一种方法。

Let's look at a more supported approach for what you're asking: a whl file is one file, and it can have an internal list of packages you need to install the .whl , which will handle doing everything for you (installing, unpacking, etc).让我们来看看您所要求的更受支持的方法: whl文件是一个文件,它可以包含安装.whl所需的包的内部列表,它将为您处理所有事情(安装、解包、等等)。 I'd look in that direction.我会朝那个方向看。

Anyway, a lot of information I know, but there's some logic as to why you can or can't do something.无论如何,我知道很多信息,但是关于为什么你可以或不能做某事有一些逻辑。 Hoped it helped, and best of luck to you.希望对你有帮助,祝你好运。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM