如何保存我之前创建的非常耗时的数组。这样我就可以重用它而无需再次运行该行代码？

Question

This lines of code extracts all tables from page 667-795 from a pdf and saves them into an array full of tables.这行代码从 pdf 中提取第 667-795 页的所有表格，并将它们保存到一个充满表格的数组中。

tablesSys = cam.read_pdf("840Dsl_sysvar_lists_man_0122_de-DE_wichtig.pdf",
                         pages = "667-795", 
                         process_threads = 100000, 
                         line_scale = 100, 
                         strip_text ='.\n'
                        ) 

tablesSys = np.array(tablesSys)

The array looks like this.数组看起来像这样。

Later I have to use this array multiple times.后来我不得不多次使用这个数组。

Now I work with jupyter lab and whenever my kernel gets offline or I start working again after hours or when I restart the kernel etc. I have to call up this line of code to get my tablesSys.现在我使用 jupyter lab 工作，每当我的 kernel 下线或者下班后我再次开始工作，或者当我重新启动 kernel 等时。我必须调用这行代码来获取我的 tableSys。 Which takes more then 11 minutes to load.加载时间超过 11 分钟。

Since the pdf doesn't change at all, I think that I could find a way to only load the code once and save the array somehow.由于 pdf 根本没有改变，我想我可以找到一种方法来只加载一次代码并以某种方式保存数组。 So in the furture I can use the array without loading the code.所以以后我可以在不加载代码的情况下使用数组。

Hope to find a solution:)))希望找到解决方案:)))

Answer 1

Try using the pickle format to save a pickle file to the file system https://docs.python.org/3/library/pickle.html尝试使用 pickle 格式将 pickle 文件保存到文件系统https://docs.python.org/3/library/pickle.html

See a high-level example here, I did not run this code but it should give you an idea.请参阅此处的高级示例，我没有运行这段代码，但它应该会给你一个想法。

import pickle

import numpy as np

# calculate the huge data slice
heavy_numpy_array = np.zeros((1000,2)) # some data

# decide where to store the data in the file-system
my_filename = 'path/to/my_file.xyz'
my_file = open(my_filename, 'wb')

# save to file
pickle.dump(heavy_numpy_array, my_file)
my_file.close()

# load the data from file
my_file_v2 = open(my_filename, 'wb')
my_long_numpy_array = pickle.load(my_file_v2)
my_file_v2.close()

Answer 2

Was playing around...一直在玩...

import numpy as np


class Cam:
    def read_pdf(self, *args, **kwargs):
        return np.random.rand(3, 2)


cam = Cam()

tablesSys = cam.read_pdf(
    "840Dsl_sysvar_lists_man_0122_de-DE_wichtig.pdf",
    pages="667-795",
    process_threads=100000,
    line_scale=100,
    strip_text=".\n",
)


with open("data.npy", "wb") as f:
    np.save(f, tablesSys)

with open("data.npy", "rb") as f:
    tablesSys = np.load(f)
print(tablesSys)

如何保存我之前创建的非常耗时的数组。这样我就可以重用它而无需再次运行该行代码？

问题描述

2 个解决方案

解决方案1
0 2023-01-23 13:38:38

解决方案2
0 2023-01-23 14:00:12

如何保存我之前创建的非常耗时的数组。 这样我就可以重用它而无需再次运行该行代码？

问题描述

2 个解决方案

解决方案1 0 2023-01-23 13:38:38

解决方案2 0 2023-01-23 14:00:12

如何保存我之前创建的非常耗时的数组。这样我就可以重用它而无需再次运行该行代码？

解决方案1
0 2023-01-23 13:38:38

解决方案2
0 2023-01-23 14:00:12