About This Blog

Including my content originally published on 𝕏, SQLperformance.com, and SQLblog.com

Tuesday 17 September 2024

Why a Self-Join Requires Halloween Protection

Title image

This article was originally published on 𝕏.

I was asked recently why Halloween Protection was needed for data modification statements that include a self-join of the target table. This gives me a chance to explain, while also covering some interesting product bug history from the SQL Server 7 and 2000 days.

If you already know all there is to know about the Halloween Problem as it applies to SQL Server, you can skip the background section.

Sunday 15 September 2024

Current State of the ANY Aggregate Transformation

Title image
This article was originally published on 𝕏.

SQL Server provides a way to select any one row from a group of rows, provided you write the statement using a specific syntax. This method returns any one row from each group, not the minimum, maximum or anything else. In principle, the one row chosen from each group is unpredictable.

The general idea of the required syntax is to logically number rows starting with 1 in each group in no particular order, then return only the rows numbered 1. The outer statement must not select the numbering column for this query optimizer transformation (SelSeqPrjToAnyAgg) to work.

Friday 13 September 2024

A Small Sample of SQL Server Chaos

Title image
This article was originally published on 𝕏.

Background

Since SQL Server indexed views don’t allow MIN or MAX aggregates, I recently found myself writing a trigger instead. The trigger’s job was to keep a summary table in sync with a source query (which featured a MAX aggregate).

There’s a cost to running a trigger after every insert, update, or delete (with up to three trigger invocations per merge statement) but fast access to the summary data was worth it in this case. Though a trigger is a bit more expensive than the inline materialised view maintenance automatically added to the source statement’s execution plan by SQL Server, efficient trigger code and good indexing can help with the performance aspect (as always).

Friday 30 August 2024

A Nonclustered Index Update Disaster

Title image
This article was originally published on 𝕏.

Introduction

Update execution plans are not something the T-SQL statement writer has much control over. You can affect the data reading side of the plan with query rewrites and hints, but there’s not nearly as much tooling available to affect the writing side of the plan.

Update processing can be extremely complex and reading data-changing execution plans correctly can also be difficult. Many important details are hidden away in obscure and poorly documented properties, or simply not present at all.

In this article, I want to you show a particularly bad update plan example. It has value in and of itself, but it will also give me a chance to describe some less well-known SQL Server details and behaviours.

Monday 12 August 2024

Don't Mix with Datetime

Title image
This article was originally published on 𝕏.

Introduction

Microsoft encourages us not to use the datetime data type:

Avoid using datetime for new work. Instead, use the time, date, datetime2, and datetimeoffset data types. These types align with the SQL Standard, and are more portable. time, datetime2 and datetimeoffset provide more seconds precision. datetimeoffset provides time zone support for globally deployed applications.

Well, ok. Sensible and well-informed people might still choose to use datetime for performance reasons. Common date and time functions have optimised implementations in the SQL Server expression service for the datetime and smalldatetime data types.

Thursday 20 June 2024

SQL Server Parallel Index Builds

Parallel Index Building Execution Plan

SQL Server doesn't support parallel modifications to a b-tree index.
That might sound surprising. After all, you can certainly write to the same b-tree index from multiple sessions concurrently. For example, two sessions can happily write alternating odd and even numbers to the same integer b-tree index. So long as both sessions execute on different schedulers and take row locks, there will be no blocking and you'll get true concurrency.
No, what I mean is: A single session can't write to a b-tree index using more than one thread. No parallel plan modifications of a b-tree index, in other words. It's a bit like the lack of parallel backward ordered scans. There's no reason it couldn't be implemented, but it hasn't been so far.
You may have thought SQL Server would use a regular parallel scan to read the index source data, optionally sort it into index key order, then add those rows to the index in parallel. This would indeed work, even without sorting, but SQL Server just can't do it.
In case you're wondering, sorting into destination key order is an optimization. The resulting index would still be correct without it, but you'd be inserting rows essentially at random into a b-tree, with all the random I/O and page splitting that would entail.
Ok, you say, but what about parallel index builds? They've been around for a long time in premium editions and certainly seem to modify a single b-tree in parallel. Yes, they do seem to, but SQL Server cheats.

Read the full article on 𝕏. 

Friday 31 May 2024

Impossible Execution Plan Timings

Erik Darling (@erikdarlingdata) shared an interesting SQL Server execution plan with me recently. The demo script is at the end of this article.

The important section is shown below: 

Impossible timings?






The Gather Streams operator appears to execute for less time (2.16s) than the Sort operator below it (5.431s). This seems impossible on the face of it. 

The Parallelism (Gather Streams) operator runs in row mode (as always), while the Sort and Hash Match (Inner Join) operators both run in batch mode. This mixed mode plan adds a little complexity to interpreting plan timings because: 
  • A batch mode operator reports CPU and elapsed times for that operator alone 
  • A row mode operator reports times for itself and all its children 
I've written about those aspects before in Understanding Execution Plan Operator Timings, which also covers a confusing situation that can arise in exclusively row mode parallel plans.

I showed a hidden option to make all operators report only their individual times in More Consistent Execution Plan Timings in SQL Server 2022. That feature isn't complete yet, so the results aren't perfect, and it's not documented or supported.

I mention all that in case you are interested in the background. None of the foregoing explains what we see in this mixed mode plan. The row mode Gather Streams elapsed time ought to include its children. The batch mode Sort should just be reporting its own elapsed time. With that understanding in mind, there's no way the Sort could run for longer than the Gather Streams. What's going on here?