SQL Server - Timing out because of table variables
Some time ago I wrote a post about the difference between temporary tables (#something) and table variables (@something) which concluded that before SQL Server 2019 table variables sucked. But I didn't give any real example. I made it very technical and clinical, perhaps creating the wrong impression that it's a matter of minute optimizations and not something somebody should know.
So gather 'round, children, and let me tell you the tale of the dreaded table variable! How a query that timed out after an hour was run in just 3 seconds with a tiny little fix. The server version was... 2017 (fade to dream sequence)
First, the setup: I needed to delete rows from a large table (30+ million rows) which did not have an associated account in a calculated list of about 20 thousand. Simple right?
DECLARE @Accounts TABLE(ACC_ID CHAR(5))
INSERT INTO @Accounts
SELECT ... whatever
SELECT TOP 1 *
FROM MyTable t
LEFT OUTER JOIN @Accounts a
ON t.ACC_ID=a.ACC_ID
WHERE a.ACC_ID IS NULL
Just get me that first orphaned record please... ... ... 40 minutes later... WTF?
As I was saying in the previous post, table variables do not have statistics and the engine assumes they have only one row. So what does the execution plan want? To go through all of the 30+ million rows and then search them into the accounts table. Only the accounts table has 20 thousand rows, too. And it takes forever!
Mind that I've simplified the case here. Table variables do allow indexes and primary keys. I tried that. No effect at all!
Let's change the query then:
CREATE TABLE #Accounts(ACC_ID CHAR(5))
INSERT INTO #Accounts
SELECT ... whatever
SELECT TOP 1 *
FROM MyTable t
LEFT OUTER JOIN #Accounts a
ON t.ACC_ID=a.ACC_ID
WHERE a.ACC_ID IS NULL
DROP TABLE #Accounts
Just get me... wait, what? The query ended in 3 seconds. Suddenly, knowing the number and structure of the rows in the table led to the correct execution plan.
But can't I use table variables somehow? You can, but you have to force the engine to do it your way. First of all, you use OPTION (FORCE ORDER) which will keep the tables in the order you declared them. And then you have to reverse the JOIN so that @Accounts is the first table, but the effect is the same.
DECLARE @Accounts TABLE(ACC_ID CHAR(5))
INSERT INTO @Accounts
SELECT ... whatever
SELECT TOP 1 *
FROM @Accounts a
RIGHT OUTER JOIN MyTable t
ON t.ACC_ID=a.ACC_ID
WHERE a.ACC_ID IS NULL
OPTION (FORCE ORDER)
Back to three seconds. Ugh!
Now, I will probably use a temporary table, because forcing the way the SQL engine interprets your queries is almost always a bad idea and it makes the code harder to read, too.
Hope it helps!
Comments
Be the first to post a comment