When you write a query to return the first few rows from a potential result set, you’ll often use the TOP clause.
To give a precise meaning to the TOP operation, it will normally be accompanied by an ORDER BY clause. Together, the TOP…ORDER BY construction can be used to precisely identify which top ‘n’ rows should be returned.
You might recall from Inside the Optimizer: Row Goals In Depth that query plans containing a row goal tend to favour nested loops or sort-free merge join over hashing.
This is because a hash join has to fully process its build input (to populate its hash table) before it can start probing for matches on its other input. Hash join therefore has a high start-up cost, balanced by a lower per-row cost once probing begins.
In this post, we will take a look at how row goals affect grouping operations.
One of the core assumptions made by the SQL Server query optimizer cost model is that clients will eventually consume all the rows produced by a query.
This results in plans that favour the overall execution cost, though it may take longer to begin producing rows.
From time to time, I encounter a system design that always issues an UPDATE against the database after a user has finished working with a record — without checking to see if any of the data was in fact altered.
The prevailing wisdom seems to be “the database will sort it out”. This raises an interesting question: How smart is SQL Server in these circumstances?
In this post, I’ll look at a generalisation of this problem: What is the impact of updating a column to the value it already contains?
The specific questions I want to answer are:
Does this kind of UPDATE generate any log activity?
Do data pages get marked as dirty (and so eventually get written out to disk)?
Iterators, Query Plans, and Why They Run Backwards
Iterators
SQL Server uses an extensible architecture for query optimization and execution, using iterators as the basic building blocks.
Iterators are probably most familiar in their graphical showplan representation, where each icon represents a single iterator. They also show up in XML query plan output as RelOp nodes:
Each iterator performs a single simple function, such as applying a filtering condition, or performing an aggregation. It can represent a logical operation, a physical operation, or (most often) both.
The optimizer has pushed the predicate ProductNumber LIKE 'T%' down from a Filter to the Index Scan on the Product table, but it remains as a residual predicate.
Inside the Optimizer: Constructing a Plan - Part 1
For today’s entry, I thought we might take a look at how the optimizer builds an executable plan using rules. To illustrate the process performed by the optimizer, we will configure it to produce incrementally better plans by progressively applying the necessary rules.
In my last post I showed how SQL Server 2005 and later can use a Segment Spool to implement aggregate window functions and the NTILE ranking function.
The query optimizer is also smart enough to recognise that some queries are logically equivalent to a window function, even if they are written using different syntax.
SQL Server 2005 introduced the OVER clause to enable partitioning of rowsets before applying a window function. This post looks at how this feature may require a query plan containing a ‘common subexpression spool’. This query plan construction is required whenever an aggregate window function or the NTILE ranking window function is used.
In my last post I promised to cover the Segment iterator in more detail, so here we go.
Segment
The Segment iterator partitions rows into groups as they flow through a query plan, checking whether the current row belongs in the same group as the previous row. For this to work, the incoming rows must be presented in an order which guarantees that all members of a group are received sequentially.
Segment has a “Group By” argument to specify how it should partition its input. The iterator adds an additional column to the rows that flow through it. This new column is used to communicate with its parent iterator, and is named something like [Segment1003].
The new column is visible in the graphical query plan properties window, or by hovering your mouse cursor over the Segment operator. The attribute name in both cases is Segment Column).
A question that often comes up on the forums is how to get the first or last row from each group of records in a table. This post describes a clever query plan optimisation that SQL Server can use for these types of query.