The story of the LEFT JOIN in T-SQL (and why I learned to love the NOT EXISTS)
- Posted in
- Left outer join vs NOT EXISTS
- SQL performance on LEFT OUTER JOIN vs NOT EXISTS
- NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: SQL Server
Let's say you want to insert in a table all records from another source that do not already exist in the table. You have several options, but the most commonly used are:
SELECT * FROM SourceTable LEFT JOIN DestinationTable ON SomeCondition WHERE DestinationTable.Id IS NULL
SELECT * FROM SourceTable WHERE NOT EXIST(SELECT 1 FROM DestinationTable WHERE SomeCondition)
Personally I prefer the first version, for readability reasons as well as having listened to the mantra "Never do selects in selects" for all my life. However, it becomes apparent that the second version is a lot more efficient. The simple reason is that for the first example Microsoft SQL Server will first join the two tables in memory, then filter. If you have multiple combinations of records that satisfy the condition this will result in some huge memory and CPU usage, especially if you have no indexes defined and, sometimes, because you have some indexes defined. The second option uses one of the few methods guaranteed to exit, NOT EXISTS, which immediately invalidates a record at the first match.
Other options involve using the EXCEPT or INTERSECT operations in SQL, but they are not really helping. Intersecting ids, for example, then inner joining with SourceTable works, but it is somewhere in between the two solutions above and it looks like crap as well. Join hints don't help either.
Be the first to post a comment