Title image
This article was originally published on 𝕏.

SQL Server provides a way to select any one row from a group of rows, provided you write the statement using a specific syntax. This method returns any one row from each group, not the minimum, maximum or anything else. In principle, the one row chosen from each group is unpredictable.

The general idea of the required syntax is to logically number rows starting with 1 in each group in no particular order, then return only the rows numbered 1. The outer statement must not select the numbering column for this query optimizer transformation (SelSeqPrjToAnyAgg) to work.

The general syntax pattern is illustrated below:

SELECT
    /*
    any columns except RowNum
    */
FROM 
(
    SELECT 
        RowNum =
            ROW_NUMBER() OVER 
            (
                PARTITION BY /* grouping column(s) */
                ORDER BY     /* grouping column(s) */
            ),
        E.*
    FROM #ExampleTable AS E
) AS ERN
WHERE 
    -- Any one row per group will do
    ERN.RowNum = 1;

Without the transformation, you’ll see several operators in the execution plan associated with the task.

First, a Segment operator to identify groups, then a Sequence Project to number rows within each group, and finally a Filter to pass on only those rows numbered 1.

Basic row mode implementation

(In batch mode, the Segment and Sequence Project operators are replaced with a single Window Aggregate that does the same job).

ANY Aggregates

With the transformation applied, SQL Server replaces the fairly literal implementation above with an equivalent one that uses a single plan operator. The replacement operator will be one of the following:

Stream Aggregate
Eager Hash Aggregate
Distinct Sort
Hash Match Flow Distinct

The Stream Aggregate and Eager Hash Aggregate group rows according to the PARTITION BY clause and apply the internal-only ANY aggregate to the remaining columns. The single aggregated result row per group is made up of the grouping column(s) plus the result of the ANY aggregate(s).

Just as the MIN aggregate selects the lowest value and the MAX aggregate chooses the highest value per group, the ANY aggregate picks any value.

The Distinct Sort and Hash Match Flow Distinct options also group in line with the PARTITION BY clause, but they don’t use the ANY column aggregate:

The Distinct Sort returns one complete row per group from its working “sort table” without any need for aggregation.
The Hash Match Flow Distinct only ever adds one row per group to its hash table. It immediately passes that row onto its parent operator. Any further input rows that match an already-emitted group are ignored. This is different from an eager hash operation, which consumes its entire input (blocking) before producing any output. Distinct rows ‘flow’ on to the parent operator as soon as a new one is encountered.

Aggregates and NULL

All SQL aggregates reject nulls. In accordance with the SQL Standard, the first time an aggregate operator encounters a NULL, SQL Server emits a warning.

This seemingly minor detail causes confusion among even experienced SQL Server practitioners. Let’s look at an example:

IF OBJECT_ID(N'tempdb..#Example', 'U') IS NOT NULL
    DROP TABLE #Example;
GO
CREATE TABLE #Example
(
    g integer NULL,
    c1_null integer NULL,
    c2_null integer NULL,
    c3_not_null integer NOT NULL DEFAULT 3,
    c4_not_null integer NOT NULL DEFAULT 4,
    c5_not_null integer NOT NULL DEFAULT 5
);
GO
CREATE CLUSTERED INDEX [CX #Example g]
    ON #Example (g);
GO
INSERT #Example
    (g, c1_null, c2_null)
VALUES
    (NULL, 0, 0),
    (1, 100, NULL),
    (1, NULL, 111),
    (1, 101, NULL),
    (2, NULL, NULL),
    (2, NULL, NULL),
    (3, 300, NULL),
    (3, 301, NULL);
GO
-- Maximum per group
SELECT 
    E.g, 
    Maximum = MAX(E.c1_null)
FROM #Example AS E
GROUP BY 
    E.g
ORDER BY 
    E.g;

The table’s contents and results of the grouping query are shown below:

Table contents and query result

A NULL is encountered during the aggregation of column c1_null, so SQL Server emits:

Warning: Null value is eliminated by an aggregate or other SET operation.

Avoiding the warning

To remove this annoyance, one might be tempted to filter out the nulls from column c1_null in the WHERE clause:

-- Maximum per group
SELECT 
    E.g, 
    Maximum = MAX(E.c1_null)
FROM #Example AS E
WHERE
    E.c1_null IS NOT NULL
GROUP BY 
    E.g
ORDER BY 
    E.g;

The sample data and query result now:

Nulls removed using the WHERE clause

The good news is that the annoying message has gone because the NULLs were filtered out before aggregation. The bad news is that the result group for g = 2 has disappeared.

NULL Confusion

This raises the quite reasonable question of why the g = 2 row was in the original result with a NULL in the aggregated column. The warning told us nulls were eliminated by the aggregate, so why did we end up with one?

This is the confusing aspect.

Part of the answer is that a SQL GROUP BY forms a grouped table with one row per distinct grouping column value. There are four distinct ‘values’ in the grouping column g (NULL, 1, 2, and 3), so the result of the GROUP BY must be a grouped table with four rows.

Now, we know the MAX aggregate rejects all nulls in column c1_null' but the g = 2 group has no non-null rows for column c1_null. This means the MAX aggregate receives no rows. What should it return in that case?

Well, the aggregate does receive rows. They just happen to all be null and the aggregate chooses to disregard them. The input rows still form a group in the grouped table and that grouped row must contain something for the aggregated column. That something turns out to be NULL.

On the other hand, when we filter nulls out in the WHERE clause, both g = 2 rows are filtered out before the grouped table is formed. There’s simply no group for g = 2 in this case, so there’s no row in the result set for it.

Finally, the MAX aggregate encounters no nulls in this scenario, so there is no warning message.

The ANY Aggregate Bug

The foregoing behaviour applies in a subtle way to the optimizer transformation using the specific ROW_NUMBER syntax I showed at the start. The null issue is so confusing here, even the SQL Server developers missed it for years…

An issue can arise when the optimizer’s transformation results in a physical plan using ANY aggregates. This means either the Stream Aggregate or Eager Hash Aggregate operator must be present. Remember, the Distinct Sort and Hash Match Flow Distinct implementations don’t use ANY aggregates.

When aggregated columns contain nulls, these are ignored by the ANY aggregate as expected. This works perfectly well if there is only one aggregated column. If there are multiple aggregated columns, the ANY aggregates might individually choose a non-null value from different input rows.

This would be an acceptable result if we had written a GROUP BY query with multiple ANY aggregates ourselves (if that feature existed). But no, this was the optimizer’s work. The query specification clearly says rows should logically be numbered per group and only rows assigned the number 1 should be returned.

Returning a row manufactured from values in different rows represents an error if there is no single source row with those values.

Bug demo

Let’s see an example where the transformation produces an invalid result:

-- Bug
SELECT
    ERN.g,
    ERN.c1_null,
    ERN.c2_null
FROM 
(
    SELECT 
        RowNum =
            ROW_NUMBER() OVER (
                PARTITION BY E.g
                ORDER BY g ASC),
        E.*
    FROM #Example AS E
) AS ERN
WHERE 
    ERN.RowNum = 1;

Buggy plan with a Stream Aggregate containing ANY aggregates

Wrong results

The highlighted result for the group where g = 1 is incorrect.

There is no single row in the source data where c1_null = 100 and c2_null = 111. These values came from different rows because the ANY aggregate ignored the nulls. Otherwise, it could have correctly returned (100, NULL), (NULL, 111), or (101, NULL).

By the way, the ANY aggregate doesn’t emit the warning about ignoring null values. There are several potential explanations for this:

Microsoft doesn’t consider ANY to be a SQL aggregate, so no warning is necessary.
The user didn’t write an aggregate in their query, so a warning would be confusing and lead to unnecessary support calls.
Microsoft just weren’t thinking about SQL Standard behaviour and the null issues when writing the implementation.

I have no idea which, if any, of those is correct. Still, the bug remains.

Demo variations

To see an Eager Hash Aggregate in the demo instead of a Stream Aggregate, add a HASH GROUP query hint.
For a Hash Match Flow Distinct, add HASH GROUP and FAST 3 query hints.
For a Distinct Sort, use no hints and drop the clustered index.

Remember, you won’t see the bug with a Hash Match Flow Distinct or Distinct Sort because they don’t use the ANY aggregate.

Product Changes and Fixes

This issue was first addressed in SQL Server 2016 by disabling the transformation completely. Before SQL Server 2016, the bug simply exists uncorrected.

I tested versions from SQL Server 2008 to 2014 inclusive to verify the buggy behaviour still exists on the latest available versions.

On SQL Server 2016 or 2017, you will always see the Segment/Sequence Project/Filter operators (or Window Aggregate/Filter in batch mode). With the transformation disabled, the bug cannot occur, but your execution plans may be slower than before.

A subsequent fix was released in SQL Server 2022 Cumulative Update 4 and SQL Server 2019 Cumulative Update 21. The entirety of the documentation in both cases is:

Fixes incorrect results for queries that filter on ROW_NUMBER and involve nullable columns.

Aside from being comically brief and non-specific, it doesn’t mention that query optimizer hotfixes must be enabled.

This can be done using documented trace flag 4199 (not available in Azure SQL Database), the ENABLE_QUERY_OPTIMIZER_HOTFIXES query hint, or by setting database scoped configuration QUERY_OPTIMIZER_HOTFIXES ON.

The fix is not enabled by default despite being a wrong-results bug. This contradicts the documentation for trace flag 4199, which states:

Important: Query Optimizer fixes that address wrong results or access violation errors aren’t enabled by Trace Flag 4199. Those fixes aren’t considered optional and become enabled by default once the update package is installed.

Fixed behaviour

From SQL Server 2019 CU21 and SQL Server 2022 CU4 with optimizer hotfixes enabled, the transformation is allowed where the ANY aggregate might be applied to at most one nullable column. This prevents wrong results under the mechanism described above.

In practice, this means at most one nullable non-grouped column projected by the statement containing the filtered ROW_NUMBER pattern. SQL Server cannot know ahead of time if a nullable column actually contains any nulls, so it errs on the side of caution. Note that a column defined as NOT NULL might become nullable in a SQL statement due to, for example, an outer join or scalar aggregation of an empty input.

You can experiment with which combinations of columns enable the transformation on fixed versions by commenting or uncommenting columns from the full demo query shown below:

SELECT
    -- Comment in/out these aggregated columns
    ERN.c1_null,
    ERN.c2_null,
    ERN.c3_not_null,
    ERN.c4_not_null,
    ERN.c5_not_null,
    -- Grouping column not subject to aggregation
    ERN.g
FROM 
(
    SELECT 
        RowNum =
            ROW_NUMBER() OVER (
                PARTITION BY E.g
                ORDER BY g ASC),
        E.*
    FROM #Example AS E
) AS ERN
WHERE 
    ERN.RowNum = 1
--OPTION 
--(
--    FAST 3,
--    HASH GROUP
--)
;

If you see the Segment/Sequence Project/Filter operator sequence in the execution plan, you know the transformation was not applied.

Summary

The following table summarises the situation as of September 2024:

ANY aggregate transformation summary table

Thanks for reading.

Page Free Space

About This Blog

Sunday, 15 September 2024

Current State of the ANY Aggregate Transformation