In the first part of this series, we saw how the Halloween Problem applies to UPDATE
queries. To recap briefly, the problem was that an index used to locate records to update had its keys modified by the update operation itself (another good reason to use included columns in an index rather than extending the keys). The query optimizer introduced an Eager Table Spool operator to separate the reading and writing sides of the execution plan to avoid the problem. In this post, we will see how the same underlying issue can affect INSERT
and DELETE
statements.
Insert Statements
Now we know a bit about the conditions that require Halloween Protection, it is quite easy to create an INSERT
example that involves reading from and writing to the keys of the same index structure. The simplest example is duplicating rows in a table (where adding new rows inevitably modifies the keys of the clustered index):
CREATE TABLE dbo.Demo
(
SomeKey integer NOT NULL,
CONSTRAINT PK_Demo
PRIMARY KEY (SomeKey)
);
INSERT dbo.Demo
SELECT SomeKey FROM dbo.Demo;
The problem is that newly inserted rows might be encountered by the reading side of the execution plan, potentially resulting in a loop that adds rows forever (or at least until some resource limit is reached). The query optimizer recognizes this risk, and adds an Eager Table Spool to provide the necessary phase separation:
A more realistic example
You probably don’t often write queries to duplicate every row in a table, but you likely do write queries where the target table for an INSERT
also appears somewhere in the SELECT
clause. One example is adding rows from a staging table that do not already exist in the destination:
CREATE TABLE dbo.Staging
(
SomeKey integer NOT NULL
);
-- Sample data
INSERT dbo.Staging
(SomeKey)
VALUES
(1234),
(1234);
-- Test query
INSERT dbo.Demo
SELECT s.SomeKey
FROM dbo.Staging AS s
WHERE NOT EXISTS
(
SELECT 1
FROM dbo.Demo AS d
WHERE d.SomeKey = s.SomeKey
);
The execution plan is:
The problem in this case is subtly different, though still an example of the same core issue. There is no value ‘1234’ in the target Demo
table, but the Staging
table contains two such entries. Without phase separation, the first ‘1234’ value encountered would be inserted successfully, but the second check would find that the value ‘1234’ now exists and would not attempt to insert it again. The statement as a whole would complete successfully.
This might produce a desirable outcome in this particular case (and might even seem intuitively correct) but it is not a correct implementation. The SQL standard requires that data modification queries execute as if the three phases of reading, writing and checking constraints occur completely separately (see part one).
Searching for all rows to insert as a single operation, we should select both ‘1234’ rows from the Staging
table, since this value does not exist in the target yet. The execution plan should therefore try to insert both ‘1234’ rows from the Staging
table, resulting in a primary key violation:
Msg 2627, Level 14, State 1, Line 1
Violation of PRIMARY KEY constraint 'PK_Demo'.
Cannot insert duplicate key in object 'dbo.Demo'.
The duplicate key value is (1234).
The statement has been terminated.
The phase separation provided by the Table Spool ensures that all checks for existence are completed before any changes are made to the target table. If you run the query in SQL Server with the sample data above, you will receive the (correct) error message.
Halloween Protection is required for INSERT
statements where the target table is also referenced in the SELECT
clause.
Delete Statements
We might expect the Halloween Problem not to apply to DELETE
statements, since it shouldn’t really matter if we try to delete a row multiple times. We can modify our staging table example to remove rows from the Demo
table that do not exist in Staging
:
TRUNCATE TABLE dbo.Demo;
TRUNCATE TABLE dbo.Staging;
INSERT dbo.Demo (SomeKey)
VALUES (1234);
DELETE dbo.Demo
WHERE NOT EXISTS
(
SELECT 1
FROM dbo.Staging AS s
WHERE s.SomeKey = dbo.Demo.SomeKey
);
This test seems to validate our intuition because there is no Table Spool in the execution plan:
This type of DELETE
does not require phase separation because each row has a unique identifier (an RID if the table is a heap, clustered index key(s) and possibly a uniquifier otherwise). This unique row locator is a stable key – there is no mechanism by which it can change during execution of this plan, so the Halloween Problem does not arise.
DELETE Halloween Protection
Nevertheless, there is at least one case where a DELETE
requires Halloween protection: when the plan references a row in the table other than the one which is being deleted. This requires a self-join, commonly found when hierarchical relationships are modelled. A simplified example is shown below:
CREATE TABLE dbo.Test
(
pk char(1) NOT NULL,
ref char(1) NULL,
CONSTRAINT PK_Test
PRIMARY KEY (pk)
);
INSERT dbo.Test
(pk, ref)
VALUES
('B', 'A'),
('C', 'B'),
('D', 'C');
There really ought to be a same-table foreign key reference defined here, but let’s ignore that design failing for a moment. The structure and data are nonetheless valid (and it is sadly quite common to find foreign keys omitted in the real world). Anyway, the task at hand is to delete any row where the ref column points to a non-existent pk value. The natural DELETE
query matching this requirement is:
DELETE dbo.Test
WHERE NOT EXISTS
(
SELECT 1
FROM dbo.Test AS t2
WHERE t2.pk = dbo.Test.ref
);
The query plan is:
Notice this plan now features a costly Eager Table Spool. Phase separation is required here because otherwise results could depend on the order in which rows are processed:
If the execution engine starts with the row where pk = B, it would find no matching row (ref = A and there is no row where pk = A). If execution then moves on to the row where pk = C, it would also be deleted because we just removed row B pointed to by its ref column. The end result would be that iterative processing in this order would delete all the rows from the table, which is clearly incorrect.
On the other hand, if the execution engine processed the row with pk =D first, it would find a matching row (ref = C). Assuming execution continued in reverse pk order, the only row deleted from the table would be the one where pk = B. This is the correct result (remember the query should execute as if the read, write, and validation phases had occurred sequentially and without overlaps).
Phase separation for constraint validation
As an aside, we can see another example of phase separation if we add a same-table foreign key constraint to the previous example:
DROP TABLE dbo.Test;
CREATE TABLE dbo.Test
(
pk char(1) NOT NULL,
ref char(1) NULL,
CONSTRAINT PK_Test
PRIMARY KEY (pk),
CONSTRAINT FK_ref_pk
FOREIGN KEY (ref)
REFERENCES dbo.Test (pk)
);
INSERT dbo.Test
(pk, ref)
VALUES
('B', NULL),
('C', 'B'),
('D', 'C');
The execution plan for the INSERT
is:
The insert itself does not require Halloween protection since the plan does not read from the same table (the data source is an in-memory virtual table represented by the Constant Scan operator).
The SQL standard does however require that phase 3 (constraint checking) occurs after the writing phase is complete. For this reason, a phase separation Eager Table Spool is added to the plan after the Clustered Index Insert, and just before each row is checked to make sure the foreign key constraint remains valid.
If you are starting to think that translating a set-based declarative SQL modification query to a robust iterative physical execution plan is a tricky business, you are beginning to see why update processing (of which Halloween Protection is but a very small part) is the most complex part of the Query Processor.
DELETE statements require Halloween Protection where a self-join of the target table is present.
Summary
Halloween Protection can be an expensive (but necessary) feature in execution plans that change data (where ‘change’ includes all SQL syntax that adds, changes or removes rows).
Halloween Protection is required for:
INSERT
plans where the target table is also read from.UPDATE
plans where either- the keys of an index are both read and modified; or
- the target table participates in a self-join.
DELETE
plans where the target table participates in a self-join.
The next part in this series will cover some special Halloween Problem optimizations that apply only to MERGE
statements.
No comments:
Post a Comment
All comments are reviewed before publication.