A MERGE
statement can fail, and incorrectly report a unique key violation when:
- The target table uses a unique filtered index; and
- No key column of the filtered index is updated; and
- A column from the filtering condition is updated; and
- Transient key violations are possible
Reproduction script
Say we have two tables, one that is the target of a MERGE
statement, and another that contains updates to be applied to the target.
The target table contains three columns, an integer primary key, a single character alternate key, and a status code column.
A filtered unique index exists on the alternate key, which is only enforced when the status code is ‘a’:
CREATE TABLE #Target
(
pk integer NOT NULL,
ak character(1) NOT NULL,
status_code character(1) NOT NULL,
PRIMARY KEY (pk)
);
CREATE UNIQUE INDEX uq1
ON #Target
(ak)
INCLUDE
(status_code)
WHERE
status_code = 'a';
The “changes” table contains an integer primary key (to identify the target row to change) and the new status code:
CREATE TABLE #Changes
(
pk integer NOT NULL,
status_code character(1) NOT NULL,
PRIMARY KEY (pk)
);
Sample data for the example is:
INSERT #Target
(pk, ak, status_code)
VALUES
(1, 'A', 'a'),
(2, 'B', 'a'),
(3, 'C', 'a'),
(4, 'A', 'd');
INSERT #Changes
(pk, status_code)
VALUES
(1, 'd'),
(4, 'a');
Visually:
#Target table #Changes table
╔════╦════╦═════════════╗ ╔════╦═════════════╗
║ pk ║ ak ║ status_code ║ ║ pk ║ status_code ║
╠════╬════╬═════════════╣ ╠════╬═════════════╣
║ 1 ║ A ║ a ║ ║ 1 ║ d ║
║ 2 ║ B ║ a ║ ║ 4 ║ a ║
║ 3 ║ C ║ a ║ ╚════╩═════════════╝
║ 4 ║ A ║ d ║
╚════╩════╩═════════════╝
Remember, the alternate key (ak
) column in target table is unique, for rows where status_code
= ‘a’.
Applying the changes to the target will change:
- row 1 from status ‘a’ to status ‘d’
- row 4 from status ‘d’ to status ‘a’.
The result of applying all the changes will still satisfy the filtered unique index, because:
- The ‘A’ in row 1 will be deleted from the filtered index (
status_code
changes to ‘d’) - The ‘A’ in row 4 will be added to the filtered index (
status_code
changes to ‘a’).
Note that there is deliberately a risk of a transient key violation here.
MERGE
Test One
We now execute a MERGE
statement to apply these changes:
MERGE #Target AS T
USING #Changes AS C
ON C.pk = T.pk
WHEN MATCHED AND C.status_code <> T.status_code
THEN UPDATE SET status_code = C.status_code;
The MERGE
changes the two target rows as expected.
The updated target table now contains:
╔════╦════╦═════════════╗
║ pk ║ ak ║ status_code ║
╠════╬════╬═════════════╣
║ 1 ║ A ║ d ║ <— changed from ‘a’
║ 2 ║ B ║ a ║
║ 3 ║ C ║ a ║
║ 4 ║ A ║ a ║ <— changed from ‘d’
╚════╩════╩═════════════╝
MERGE
Test Two
Now let’s repopulate the #Changes
table to reverse the updates we just performed:
TRUNCATE TABLE #Changes;
INSERT #Changes
(pk, status_code)
VALUES
(1, 'a'),
(4, 'd');
This will change row 1 back to status ‘a’ and row 4 back to status ‘d’.
Before these changes, the tables contain:
#Target table #Changes table
╔════╦════╦═════════════╗ ╔════╦═════════════╗
║ pk ║ ak ║ status_code ║ ║ pk ║ status_code ║
╠════╬════╬═════════════╣ ╠════╬═════════════╣
║ 1 ║ A ║ d ║ ║ 1 ║ a ║
║ 2 ║ B ║ a ║ ║ 4 ║ d ║
║ 3 ║ C ║ a ║ ╚════╩═════════════╝
║ 4 ║ A ║ a ║
╚════╩════╩═════════════╝
We now execute the same MERGE
statement as before:
MERGE #Target AS T
USING #Changes AS C
ON C.pk = T.pk
WHEN MATCHED AND C.status_code <> T.status_code
THEN UPDATE SET status_code = C.status_code;
This time we receive the error message:
Msg 2601, Level 14, State 1
Cannot insert duplicate key row in object ‘dbo.#Target’ with unique index ‘uq1’. The duplicate key value is (A).
The statement has been terminated.
Applying the changes using UPDATE
We will now rewrite the MERGE
as an UPDATE
:
UPDATE T
SET status_code = C.status_code
FROM #Target AS T
JOIN #Changes AS C ON
T.pk = C.pk
WHERE
C.status_code <> T.status_code;
This statement succeeds where the MERGE
failed. The two rows are updated as expected:
╔════╦════╦═════════════╗
║ pk ║ ak ║ status_code ║
╠════╬════╬═════════════╣
║ 1 ║ A ║ a ║ <— changed back to ‘a’
║ 2 ║ B ║ a ║
║ 3 ║ C ║ a ║
║ 4 ║ A ║ d ║ <— changed back to ‘d’
╚════╩════╩═════════════╝
What went wrong with MERGE
?
Execution of the MERGE
statement happens to apply the changes in the order of the pk
column.
In test one, this was not a problem Row 1 was removed from the unique filtered index by changing status_code
from ‘a’ to ‘d’ before row 4 was added to it. At no point does the table contain two rows where ak = ‘A’
and status_code = ‘a’
.
In test two, the first operation changed row 1 from status ‘d’ to status ‘a’. This meant there were transiently two rows in the filtered unique index where ak = ‘A’
(both row 1 and row 4 meet the index filtering criteria ‘status_code = a’).
The storage engine does not allow even transient unique key violations (unless IGNORE_DUP_KEY
is on, but that is a different story, and is ignored by MERGE
in any case).
This strict rule applies regardless of the fact that after all changes were applied, there would be no unique key violation (row 4 would eventually be changed from ‘a’ to ‘d’, removing it from the filtered unique index, and resolving the key violation).
Why it went wrong
The query optimizer usually detects when a transient uniqueness violation could occur, and builds a plan that avoids the issue.
I wrote about this previously in Beware Sneaky Reads with Unique Indexes (you can read more about the details on pages 495-497 of the Microsoft SQL Server 2008 Internals book, and in Craig Freedman’s post Maintaining Unique Indexes.
To summarize: The optimizer introduces Split, Filter, Sort, and Collapse operators into the query plan to:
- Split each update into a delete followed by an insert
- Filter out rows that would not change the index (due to the filter on the index, or a non-updating update)
- Sort the resulting stream by index key, with deletes before inserts
- Collapse surviving delete/insert pairs on the same index key back into an update
The effect of all this is that only net changes are applied to an index (as one or more insert, update, and/or delete operations).
In this case, the net effect is a single update of the filtered unique index: Changing the row for ak = ‘A’
from pk = 4
to pk = 1
. In case that is less than 100% clear, let’s look at the operation in test two again:
#Target table #Changes table Result
╔════╦════╦═════════════╗ ╔════╦═════════════╗ ╔════╦════╦═════════════╗
║ pk ║ ak ║ status_code ║ ║ pk ║ status_code ║ ║ pk ║ ak ║ status_code ║
╠════╬════╬═════════════╣ ╠════╬═════════════╣ ╠════╬════╬═════════════╣
║ 1 ║ A ║ d ║ ║ 1 ║ d ║ ║ 1 ║ A ║ a ║
║ 2 ║ B ║ a ║ ║ 4 ║ a ║ ║ 2 ║ B ║ a ║
║ 3 ║ C ║ a ║ ╚════╩═════════════╝ ║ 3 ║ C ║ a ║
║ 4 ║ A ║ a ║ ║ 4 ║ A ║ d ║
╚════╩════╩═════════════╝ ╚════╩════╩═════════════╝
From the filtered index’s point of view (filtered for status_code = ‘a’
and shown in nonclustered index key order) the overall effect of the query is:
Before After
╔════╦════╗ ╔════╦════╗
║ pk ║ ak ║ ║ pk ║ ak ║
╠════╬════╣ ╠════╬════╣
║ 4 ║ A ║ ║ 1 ║ A ║ <- the only change
║ 2 ║ B ║ ║ 2 ║ B ║
║ 3 ║ C ║ ║ 3 ║ C ║
╚════╩════╝ ╚════╩════╝
The single net change there is a change of pk
from 4 to 1 for the nonclustered index entry ak = ‘A’
.
This is the magic performed by the Split, Sort, and Collapse. Notice in particular how the original changes to the index key (on the ak
column) have been transformed into an update of a non-key column (pk
is included in the nonclustered index).
By not updating any nonclustered index keys, we are guaranteed to avoid transient key violations.
The Execution Plans
The MERGE
execution plan that produces the incorrect key-violation error looks like this:
The successful UPDATE
execution plan is:
The MERGE
execution plan is a narrow (per-row) update. The Clustered Index Merge operator maintains both the clustered index and the filtered nonclustered index.
The UPDATE
plan is a wide (per-index) update. The clustered index is maintained first, then the Split, Filter, Sort, and Collapse sequence is applied, before the nonclustered index is updated as necessary.
There is always a wide update plan for any (non-Hekaton) statement that modifies the database. The narrow form is a performance optimization where the number of rows is expected to be relatively small, and is not available for all operations.
One of the operations that should disallow a narrow plan is maintaining a unique index where transient key violations could occur.
Workarounds
The MERGE
can be made to work (producing a wide update plan with Split, Sort, and Collapse) by:
- Adding all columns referenced in the filtered index’s
WHERE
clause to the index key (INCLUDE
is not sufficient); or - Executing the query with trace flag 8790 set e.g.
OPTION (QUERYTRACEON 8790)
.
Undocumented trace flag 8790 forces a wide update plan for any data-changing statement (remember that a wide update plan is always possible). The flag is ignored for Hekaton updates.
Either change will produce a successfully-executing wide update plan for the MERGE
that failed.
Conclusion
The optimizer fails to spot the possibility of transient unique key violations with MERGE
under the conditions listed at the start of this post.
It incorrectly chooses a narrow update plan for the MERGE
, which cannot provide the protection of a Split/Sort/Collapse sequence for the nonclustered index maintenance.
The MERGE
plan may fail at execution time depending on the order in which rows are processed, and the distribution of data in the database. Worse, a previously solid MERGE
query may suddenly start to fail unpredictably if a filtered unique index is added to the merge target table at any point.
Web archive of the Connect bug I filed: MERGE Incorrectly Reports Unique Key Violations
This was closed as Won’t Fix and not copied across to the new Azure Feedback site. Even so, it was fixed at some point:
Bug reproduces on:
- SQL Server 2014 (SP3-CU2) build 12.0.6214.1
- SQL Server 2012 (SP4-GDR) build 11.0.7462.6
- SQL Server 2008 R2 (SP3-GDR) build 10.50.6560.0
- SQL Server 2008 (SP3 CU8) build 10.0.5828
Fixed on:
- SQL Server 2016 (SP2-CU5) build 13.0.5264.1
- SQL Server 2017 (RTM-CU13-OD) build 14.0.3049.1
- SQL Server 2019 (CTP2.2) build 15.0.1200.24
© Paul White
email: SQLkiwi@gmail.com
twitter: @SQL_Kiwi
No comments:
Post a Comment
All comments are reviewed before publication.