简体   繁体   English

SQL存储过程临时表内存问题

[英]SQL stored procedure temporary table memory problem

We have the following simple Stored Procedure that runs as an overnight SQL server agent job. 我们有以下简单的存储过程作为隔夜SQL服务器代理作业运行。 Usually it runs in 20 minutes, but recently the MatchEvent and MatchResult tables have grown to over 9 million rows each. 通常它在20分钟内运行,但最近MatchEvent和MatchResult表已经增长到每个超过900万行。 This has resulted in the store procedure taking over 2 hours to run, with all 8GB of memory on our SQL box being used up. 这导致存储过程耗时超过2小时,我们的SQL盒上的所有8GB内存都用完了。 This renders the database unavailable to the regular queries that are trying to access it. 这会使数据库对尝试访问它的常规查询不可用。

I assume the problem is that temp table is too large and is causing the memory and database unavailablity issues. 我假设问题是临时表太大并导致内存和数据库不可用性问题。

How can I rewrite the stored procedure to make it more efficient and less memory intensive? 如何重写存储过程以使其更高效,更少内存密集?

Note: I have edited the SQL to indicate that there is come condition affecting the initial SELECT statement. 注意:我已编辑SQL以指示存在影响初始SELECT语句的条件。 I had previously left this out for simplicity. 为了简单起见,我之前已经将其留下了。 Also, when the query runs CPU usage is at 1-2%, but memoery, as previously stated, is maxed out 此外,当查询运行时,CPU使用率为1-2%,但如前所述,memoery最大化


CREATE TABLE #tempMatchResult
(
    matchId VARCHAR(50)
)

INSERT INTO #tempMatchResult SELECT MatchId FROM MatchResult WHERE SOME_CONDITION

DELETE FROM MatchEvent WHERE
MatchId IN (SELECT MatchId FROM #tempMatchResult)

DELETE FROM MatchResult WHERE MatchId In (SELECT MatchId FROM #tempMatchResult)

DROP TABLE #tempMatchResult

There's probably a lot of stuff going on here, and it's not all your query. 这里可能会发生很多事情,并不是你所有的问题。

First, I agree with the other posters. 首先,我同意其他海报。 Try to rewrite this without a temp table if at all possible. 如果可能的话,尝试在没有临时表的情况下重写它。

But assuming that you need a temp table here, you have a BIG problem in that you have no PK defined on it. 但是假设你需要一个临时表,你有一个很大的问题,就是你没有定义PK。 It's vastly going to expand the amount of time your queries will take to run. 它将大大扩展您的查询运行所需的时间。 Create your table like so instead: 改为创建你的表:

CREATE TABLE #tempMatchResult (
    matchId VARCHAR(50) NOT NULL PRIMARY KEY /* NOT NULL if at all possible */
);

INSERT INTO #tempMatchResult
SELECT DISTINCT MatchId FROM MatchResult;

Also, make sure that your TempDB is sized correctly. 另外,请确保TempDB的大小正确。 Your SQL server may very well be expanding the database file dynamically on you, causing your query to suck CPU and disk time. 您的SQL服务器可能正在动态地扩展数据库文件,导致您的查询吸收CPU和磁盘时间。 Also, make sure your transaction log is sized correctly, and that it is not auto-growing on you. 此外,请确保您的事务日志大小正确,并且它不会自动增长。 Good luck. 祝好运。

DELETE FROM MatchResult WHERE
MatchId In (SELECT MatchId FROM #tempMatchResult)

can be replaced with 可以替换为

DELETE FROM MatchResult WHERE SOME_CONDITION

Can you just turn cascading deletes on between matchresult and matchevent? 你能在matchresult和matchevent之间转换级联删除吗? Then you need only worry about identifying one set of data to delete, and let SQL take care of the other. 然后,您只需要担心识别要删除的一组数据,并让SQL处理另一组数据。

The alternative would be to make use of the OUTPUT clause, but that's definitely more fiddle. 另一种方法是使用OUTPUT子句,但这肯定更加小巧。

Both of these would let you delete from both tables, but only have to state (and execute) your filter predicate once. 这两个都可以让你从两个表中删除,但只需要说明(并执行)你的过滤谓词一次。 This may still not be as performant as a batching approach as suggested by other posters, but worth considering. 这可能仍然不如其他海报所建议的批处理方法那样高效,但值得考虑。 YMMV 因人而异

Looking at the code above, why do you need a temp table? 看看上面的代码,为什么需要临时表?


DELETE FROM MatchEvent WHERE
MatchId IN (SELECT MatchId FROM MatchResult)


DELETE FROM MatchResult
-- OR Truncate can help here, if all the records are to be deleted anyways.

You probably want to process this piecewise in some way. 您可能希望以某种方式处理此分段。 (I assume queries are a lot more complicated that you showed?) In that case, you'd want try one of these: (我假设您展示的查询要复杂得多吗?)在这种情况下,您需要尝试以下方法之一:

  • Write your stored procedure to iterate over results. 编写存储过程以迭代结果。 (Might still lock while processing.) (处理时可能仍会锁定。)
  • Repeatedly select the N first hits, eg LIMIT 100 and process those. 重复选择N个第一次命中,例如LIMIT 100并处理它们。
  • Divide work by scanning regions of the table separately, using something like WHERE M <= x AND x < N. 通过使用WHERE M <= x AND x <N之类的东西分别扫描表的区域来划分工作。
  • Run the "midnight job" more often. 更频繁地运行“午夜工作”。 Seriously, running stuff like this every 5 mins instead can work wonders, especially if work increases non-linearly. 说真的,每隔5分钟运行这样的东西就可以创造奇迹,尤其是如果工作非线性增加的话。 (If not, you could still just get the work spread out over the hours of the day.) (如果没有,你仍然可以在一天中的几个小时内完成工作。)

In Postgres, I've had some success using conditional indices. 在Postgres中,我使用条件索引取得了一些成功。 They work magic by applying an index if certain conditions are met. 如果满足某些条件,它们通过应用索引来工作。 This means that you can keep the many 'resolved' and the few unresolved rows in the same table, but still get that special index over just the unresolved ones. 这意味着您可以在同一个表中保留许多“已解决”和少数未解析的行,但仍然可以获得仅针对未解析的行的特殊索引。 Ymmv. 因人而异。

Should be pointed out that this is where using databases gets interesting . 应该指出,这是使用数据库变得有趣的地方 You need to pay close attention to your indices and use EXPLAIN on your queries a lot. 您需要密切关注索引并对查询使用EXPLAIN

(Oh, and remember, interesting is a good thing in your hobbies, but not at work.) (哦,记住, 有趣的是你的爱好是好事,但不是在工作。)

First, indexes are a MUST here see Dave M's answer. 首先,索引必须在这里看到Dave M的回答。

Another approach that I will sometime use when deleting very large data sets, is creating a shadow table with all the data, recreating indexes and then using sp_rename to switch it in. You have to be careful with transactions here, but depending on the amount of data being deleted this can be faster. 我将在删除非常大的数据集时使用的另一种方法是创建包含所有数据的影子表,重新创建索引,然后使用sp_rename将其切换。您必须小心处理此处的事务,但取决于数量被删除的数据可以更快。

Note If there is pressure on tempdb consider using joins and not copying all the data into the temp table. 注意如果tempdb存在压力,请考虑使用连接而不是将所有数据复制到临时表中。

So for example 所以举个例子

CREATE TABLE #tempMatchResult (
    matchId VARCHAR(50) NOT NULL PRIMARY KEY /* NOT NULL if at all possible */
);

INSERT INTO #tempMatchResult
SELECT DISTINCT MatchId FROM MatchResult;

set transaction isolation level serializable
begin transaction 

create table MatchEventT(columns... here)

insert into MatchEventT
select * from MatchEvent m
left join #tempMatchResult t on t.MatchId  = m.MatchId 
where t.MatchId is null 

-- create all the indexes for MatchEvent

drop table MatchEvent
exec sp_rename 'MatchEventT', 'MatchEvent'

-- similar code for MatchResult

commit transaction 


DROP TABLE #tempMatchResult

Avoid the temp table if possible 尽可能避免临时表

It's only using up memory. 它只是耗尽内存。
You could try this: 你可以试试这个:

DELETE MatchEvent
FROM MatchEvent  e , 
     MatchResult r
WHERE e.MatchId = r.MatchId 

If you can't avoid a temp table 如果你无法避免临时表

I'm going to stick my neck out here and say: you don't need an index on your temporary table because you want the temp table to be the smallest table in the equation and you want to table scan it (because all the rows are relevant). 我要把我的脖子伸到这里然后说: 你不需要临时表上的索引,因为你希望临时表是等式中最小的表,你想要对它进行表扫描(因为所有的行)是相关的)。 An index won't help you here. 索引对你没有帮助。

Do small bits of work 做一点点工作

Work on a few rows at a time. 一次只能处理几行。
This will probably slow down the execution, but it should free up resources. 这可能会减慢执行速度,但它应该释放资源。

- One row at a time - 一次一排
 SELECT @MatchId = min(MatchId) FROM MatchResult WHILE @MatchId IS NOT NULL BEGIN DELETE MatchEvent WHERE Match_Id = @MatchId SELECT @MatchId = min(MatchId) FROM MatchResult WHERE MatchId > @MatchId END 
- A few rows at a time - 一次几行
 CREATE TABLE #tmp ( MatchId Varchar(50) ) /* get list of lowest 1000 MatchIds: */ INSERT #tmp SELECT TOP (1000) MatchId FROM MatchResult ORDER BY MatchId SELECT @MatchId = min(MatchId) FROM MatchResult WHILE @MatchId IS NOT NULL BEGIN DELETE MatchEvent FROM MatchEvent e , #tmp t WHERE e.MatchId = t.MatchId /* get highest MatchId we've procesed: */ SELECT @MinMatchId = MAX( MatchId ) FROM #tmp /* get next 1000 MatchIds: */ INSERT #tmp SELECT TOP (1000) MatchId FROM MatchResult WHERE MatchId > @MinMatchId ORDER BY MatchId END 

This one deletes up to 1000 rows at a time. 这个一次最多删除1000行。
The more rows you delete at a time, the more resources you will use but the faster it will tend to run (until you run out of resources!). 您一次删除的行越多,您将使用的资源就越多,但运行的速度就越快(直到资源耗尽!)。 You can experiment to find a more optimal value than 1000. 您可以尝试找到比1000更优的值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM