About This Blog

Including my content originally published on 𝕏, SQLperformance.com, and SQLblog.com

Friday 15 February 2013

The Halloween Problem – Part 2

The Halloween Problem – Part 2

In the first part of this series, we saw how the Halloween Problem applies to UPDATE queries. To recap briefly, the problem was that an index used to locate records to update had its keys modified by the update operation itself (another good reason to use included columns in an index rather than extending the keys). The query optimizer introduced an Eager Table Spool operator to separate the reading and writing sides of the execution plan to avoid the problem. In this post, we will see how the same underlying issue can affect INSERT and DELETE statements.

Insert Statements

Now we know a bit about the conditions that require Halloween Protection, it is quite easy to create an INSERT example that involves reading from and writing to the keys of the same index structure. The simplest example is duplicating rows in a table (where adding new rows inevitably modifies the keys of the clustered index):

CREATE TABLE dbo.Demo
(
    SomeKey integer NOT NULL,

    CONSTRAINT PK_Demo
        PRIMARY KEY (SomeKey)
);

INSERT dbo.Demo
SELECT SomeKey FROM dbo.Demo;

The problem is that newly inserted rows might be encountered by the reading side of the execution plan, potentially resulting in a loop that adds rows forever (or at least until some resource limit is reached). The query optimizer recognizes this risk, and adds an Eager Table Spool to provide the necessary phase separation:

Copy Rows Execution Plan

A more realistic example

You probably don’t often write queries to duplicate every row in a table, but you likely do write queries where the target table for an INSERT also appears somewhere in the SELECT clause. One example is adding rows from a staging table that do not already exist in the destination:

CREATE TABLE dbo.Staging
(
    SomeKey integer NOT NULL
);

-- Sample data
INSERT dbo.Staging
    (SomeKey)
VALUES
    (1234),
    (1234);

-- Test query
INSERT dbo.Demo
SELECT s.SomeKey
FROM dbo.Staging AS s
WHERE NOT EXISTS
(
    SELECT 1
    FROM dbo.Demo AS d
    WHERE d.SomeKey = s.SomeKey
);

The execution plan is:

Staging Table Demo 1

The problem in this case is subtly different, though still an example of the same core issue. There is no value ‘1234’ in the target Demo table, but the Staging table contains two such entries. Without phase separation, the first ‘1234’ value encountered would be inserted successfully, but the second check would find that the value ‘1234’ now exists and would not attempt to insert it again. The statement as a whole would complete successfully.

This might produce a desirable outcome in this particular case (and might even seem intuitively correct) but it is not a correct implementation. The SQL standard requires that data modification queries execute as if the three phases of reading, writing and checking constraints occur completely separately (see part one).

Searching for all rows to insert as a single operation, we should select both ‘1234’ rows from the Staging table, since this value does not exist in the target yet. The execution plan should therefore try to insert both ‘1234’ rows from the Staging table, resulting in a primary key violation:

Msg 2627, Level 14, State 1, Line 1
Violation of PRIMARY KEY constraint 'PK_Demo'.
Cannot insert duplicate key in object 'dbo.Demo'.
The duplicate key value is (1234).
The statement has been terminated.

The phase separation provided by the Table Spool ensures that all checks for existence are completed before any changes are made to the target table. If you run the query in SQL Server with the sample data above, you will receive the (correct) error message.

Halloween Protection is required for INSERT statements where the target table is also referenced in the SELECT clause.

Delete Statements

We might expect the Halloween Problem not to apply to DELETE statements, since it shouldn’t really matter if we try to delete a row multiple times. We can modify our staging table example to remove rows from the Demo table that do not exist in Staging:

TRUNCATE TABLE dbo.Demo;
TRUNCATE TABLE dbo.Staging;

INSERT dbo.Demo (SomeKey)
VALUES (1234);

DELETE dbo.Demo
WHERE NOT EXISTS 
(
    SELECT 1 
    FROM dbo.Staging AS s 
    WHERE s.SomeKey = dbo.Demo.SomeKey
);

This test seems to validate our intuition because there is no Table Spool in the execution plan:

Delete with no Halloween Protection

This type of DELETE does not require phase separation because each row has a unique identifier (an RID if the table is a heap, clustered index key(s) and possibly a uniquifier otherwise). This unique row locator is a stable key – there is no mechanism by which it can change during execution of this plan, so the Halloween Problem does not arise.

DELETE Halloween Protection

Nevertheless, there is at least one case where a DELETE requires Halloween protection: when the plan references a row in the table other than the one which is being deleted. This requires a self-join, commonly found when hierarchical relationships are modelled. A simplified example is shown below:

CREATE TABLE dbo.Test
(
    pk char(1) NOT NULL,
    ref char(1) NULL,

    CONSTRAINT PK_Test
        PRIMARY KEY (pk)
);

INSERT dbo.Test
    (pk, ref)
VALUES
    ('B', 'A'),
    ('C', 'B'),
    ('D', 'C');

There really ought to be a same-table foreign key reference defined here, but let’s ignore that design failing for a moment. The structure and data are nonetheless valid (and it is sadly quite common to find foreign keys omitted in the real world). Anyway, the task at hand is to delete any row where the ref column points to a non-existent pk value. The natural DELETE query matching this requirement is:

DELETE dbo.Test
WHERE NOT EXISTS 
(
    SELECT 1 
    FROM dbo.Test AS t2 
    WHERE t2.pk = dbo.Test.ref
);

The query plan is:

DELETE Halloween Protection

Notice this plan now features a costly Eager Table Spool. Phase separation is required here because otherwise results could depend on the order in which rows are processed:

If the execution engine starts with the row where pk = B, it would find no matching row (ref = A and there is no row where pk = A). If execution then moves on to the row where pk = C, it would also be deleted because we just removed row B pointed to by its ref column. The end result would be that iterative processing in this order would delete all the rows from the table, which is clearly incorrect.

On the other hand, if the execution engine processed the row with pk =D first, it would find a matching row (ref = C). Assuming execution continued in reverse pk order, the only row deleted from the table would be the one where pk = B. This is the correct result (remember the query should execute as if the read, write, and validation phases had occurred sequentially and without overlaps).

Phase separation for constraint validation

As an aside, we can see another example of phase separation if we add a same-table foreign key constraint to the previous example:

DROP TABLE dbo.Test;

CREATE TABLE dbo.Test
(
    pk char(1) NOT NULL,
    ref char(1) NULL,

    CONSTRAINT PK_Test
        PRIMARY KEY (pk),

    CONSTRAINT FK_ref_pk
        FOREIGN KEY (ref)
        REFERENCES dbo.Test (pk)
);

INSERT dbo.Test
    (pk, ref)
VALUES
    ('B', NULL),
    ('C', 'B'),
    ('D', 'C');

The execution plan for the INSERT is:

Foreign key constraint plan

The insert itself does not require Halloween protection since the plan does not read from the same table (the data source is an in-memory virtual table represented by the Constant Scan operator).

The SQL standard does however require that phase 3 (constraint checking) occurs after the writing phase is complete. For this reason, a phase separation Eager Table Spool is added to the plan after the Clustered Index Insert, and just before each row is checked to make sure the foreign key constraint remains valid.

If you are starting to think that translating a set-based declarative SQL modification query to a robust iterative physical execution plan is a tricky business, you are beginning to see why update processing (of which Halloween Protection is but a very small part) is the most complex part of the Query Processor.

DELETE statements require Halloween Protection where a self-join of the target table is present.

Summary

Halloween Protection can be an expensive (but necessary) feature in execution plans that change data (where ‘change’ includes all SQL syntax that adds, changes or removes rows).

Halloween Protection is required for:

  • INSERT plans where the target table is also read from.
  • UPDATE plans where either
    • the keys of an index are both read and modified; or
    • the target table participates in a self-join.
  • DELETE plans where the target table participates in a self-join.

The next part in this series will cover some special Halloween Problem optimizations that apply only to MERGE statements.

[ Part 1 | Part 2 | Part 3 | Part 4 ]

No comments:

Post a Comment

All comments are reviewed before publication.