[英]How can I save an array that I created very timeconsumigly before. So I can reuse it without running the line of code again?
This lines of code extracts all tables from page 667-795 from a pdf and saves them into an array full of tables.这行代码从 pdf 中提取第 667-795 页的所有表格,并将它们保存到一个充满表格的数组中。
tablesSys = cam.read_pdf("840Dsl_sysvar_lists_man_0122_de-DE_wichtig.pdf",
pages = "667-795",
process_threads = 100000,
line_scale = 100,
strip_text ='.\n'
)
tablesSys = np.array(tablesSys)
The array looks like this.数组看起来像这样。
Later I have to use this array multiple times.后来我不得不多次使用这个数组。
Now I work with jupyter lab and whenever my kernel gets offline or I start working again after hours or when I restart the kernel etc. I have to call up this line of code to get my tablesSys.现在我使用 jupyter lab 工作,每当我的 kernel 下线或者下班后我再次开始工作,或者当我重新启动 kernel 等时。我必须调用这行代码来获取我的 tableSys。 Which takes more then 11 minutes to load.加载时间超过 11 分钟。
Since the pdf doesn't change at all, I think that I could find a way to only load the code once and save the array somehow.由于 pdf 根本没有改变,我想我可以找到一种方法来只加载一次代码并以某种方式保存数组。 So in the furture I can use the array without loading the code.所以以后我可以在不加载代码的情况下使用数组。
Hope to find a solution:)))希望找到解决方案:)))
Try using the pickle format to save a pickle file to the file system https://docs.python.org/3/library/pickle.html尝试使用 pickle 格式将 pickle 文件保存到文件系统https://docs.python.org/3/library/pickle.html
See a high-level example here, I did not run this code but it should give you an idea.请参阅此处的高级示例,我没有运行这段代码,但它应该会给你一个想法。
import pickle
import numpy as np
# calculate the huge data slice
heavy_numpy_array = np.zeros((1000,2)) # some data
# decide where to store the data in the file-system
my_filename = 'path/to/my_file.xyz'
my_file = open(my_filename, 'wb')
# save to file
pickle.dump(heavy_numpy_array, my_file)
my_file.close()
# load the data from file
my_file_v2 = open(my_filename, 'wb')
my_long_numpy_array = pickle.load(my_file_v2)
my_file_v2.close()
Was playing around...一直在玩...
import numpy as np
class Cam:
def read_pdf(self, *args, **kwargs):
return np.random.rand(3, 2)
cam = Cam()
tablesSys = cam.read_pdf(
"840Dsl_sysvar_lists_man_0122_de-DE_wichtig.pdf",
pages="667-795",
process_threads=100000,
line_scale=100,
strip_text=".\n",
)
with open("data.npy", "wb") as f:
np.save(f, tablesSys)
with open("data.npy", "rb") as f:
tablesSys = np.load(f)
print(tablesSys)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.