简体   繁体   English

sp_execute_external_script Python In Memory 变量用于更快的处理

[英]sp_execute_external_script Python In Memory Variable for Faster Process

Is there a way to make a variable saved in memory (like global variable) without loading using pickle.loads every time executing a script using sp_execute_external_script ?有没有办法让一个变量保存在 memory (如全局变量)中,而无需在每次使用sp_execute_external_script执行脚本时使用pickle.loads加载?

I have a Python script that process a data using preprocessed matrix.我有一个 Python 脚本,它使用预处理矩阵处理数据。 I have the matrix saved in a table once using script A.我曾经使用脚本 A 将矩阵保存在表中。

--Script A
DECLARE @matrix VARBINARY(MAX)
EXECUTE sp_execute_external_script @language = N'Python'
  , @script = N'
...
matrix = pickle.dumps(processed_matrix)
'
  , @input_data_1 = N'SOME SELECT QUERY'
  , @params = N'@matrix VARBINARY(MAX) OUTPUT'
  , @matrix = @matrix OUTPUT

DELETE FROM MatrixTable
INSERT INTO MatrixTable(matrix) VALUES(@matrix)

Then sending the matrix through a parameter every time running script B.然后在每次运行脚本 B 时通过一个参数发送矩阵。

--Script B
DECLARE @matrix VARBINARY(MAX)
SELECT @matrix = matrix
FROM MatrixTable

EXECUTE sp_execute_external_script @language = N'Python'
  , @script = N'
preprocessed_matrix = pickle.loads(matrix)
...
'
  , @input_data_1 = N'SOME SELECT QUERY'
  , @params = N'@matrix VARBINARY(MAX)'
  , @matrix = @matrix

Because the matrix is processed only once and it loads multiple times, so I think it could be great if script A runs on server starts and stored the resulting matrix in sql memory that can be accessed from script B without save and load from a table.因为矩阵只处理一次并且加载多次,所以我认为如果脚本 A 在服务器启动时运行并将生成的矩阵存储在 sql memory 中,可以从脚本 B 访问而无需从表中保存和加载。 Is there a way to store the matrix in memory so I don't need to save it to a table and load it using pickle to make it faster?有没有办法将矩阵存储在 memory 中,所以我不需要将它保存到表中并使用 pickle 加载它以使其更快?

Do you really need to pickle the matrix and save it this way?你真的需要腌制矩阵并以这种方式保存吗?

I would just convert the matrix to a pandas dataframe and store it into a SQL table.我只需将矩阵转换为 pandas dataframe 并将其存储到 SQL 表中。 This way you can access it using SQL Server cached memory.这样,您可以使用 SQL 服务器缓存 memory 访问它。 Use it as a table that reloads.将其用作重新加载的表。

Depending on how big your data is, this should be the best approach.根据您的数据有多大,这应该是最好的方法。 Remember SQL Server stores data in 8k pages, so storing a lob like VARBINARY(MAX) into a single column means SQL Server has to split the data into multiple pages.请记住 SQL 服务器将数据存储在 8k 页中,因此将 VARBINARY(MAX) 之类的 lob 存储到单个列中意味着 SQL 服务器必须将数据拆分为多个页面。

Having the matrix row by row in a SQL Table is the preferred way of doing it via SQL Server.在 SQL 表中逐行放置矩阵是通过 SQL 服务器执行此操作的首选方式。 It is built and optimized for this.它为此构建和优化。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM