A LIKE
predicate with only a trailing wildcard can usually use an index seek, as the following AdventureWorks sample database query shows:
SELECT
P.[Name]
FROM Production.Product AS P
WHERE
P.[Name] LIKE N'D%';
SQL Server determines a covering range, which depends on the collation. It seeks the string index using the range as the start and end points of a partial scan, and applies the original LIKE
condition as a residual predicate.
More specifically, the Storage Engine seeks the index to locate rows in the covering range, and the Query Processor applies the residual predicate (the LIKE
) to the rows it receives back.
Dynamic Seeks
What if the LIKE
search term is in a variable?
DECLARE
@Like nvarchar(50) = N'D%';
SELECT
P.[Name]
FROM Production.Product AS P
WHERE
p.[Name] LIKE @Like;
SQL Server can still perform a seek here, but it needs to determine the covering seek range for the search term at execution time, not at compilation time:
The plan now contains an extra Constant Scan, a Compute Scalar and a Nested Loops Join. These operators are interesting because they have zero cost estimates: No CPU cost, no I/O cost, nothing.
These new operators are purely architectural in nature. They are a workaround for the fact that SQL Server cannot currently perform a dynamic seek entirely within the Index Seek operator itself.
The optimizer’s output is:
*** Output Tree: (trivial plan) ***
PhyOp_Filter
PhyOp_Range TBL: Production.Product
ScaOp_Intrinsic like
ScaOp_Identifier QCOL: [P].Name
ScaOp_Identifier COL: @Like
ScaOp_Const TI(nvarchar ML=2)
ScaOp_Identifier COL: ConstExpr1002
ScaOp_Identifier COL: ConstExpr1003
ScaOp_Identifier COL: ConstExpr1004
The additional operators needed for dynamic seek execution are added after query optimization is complete, so they cannot affect plan choices. Since there is no more cost-based analysis to do, the extra machinery is given zero costs.
Looking at the finished execution plan:
- The Constant Scan produces a single in-memory row with no columns.
- The Compute Scalar defines expressions to describe the covering seek range (using the runtime value of the
@Like
variable). - The Nested Loops Join drives the seek using the computed range information (provided as correlated values).
The upper tooltip shows that the Compute Scalar uses three internal functions, LikeRangeStart
, LikeRangeEnd
, and LikeRangeInfo
.
The first two functions describe the range as an open interval. The third function returns a set of flags encoded in an integer. These are used internally to define certain seek properties for the Storage Engine.
The lower tooltip shows the seek on the open interval described by the result of LikeRangeStart
and LikeRangeEnd
, together with the application of the residual predicate LIKE @Like
.
More Dynamic Seeks
Something very similar occurs in plans that use IN
or OR
with variables:
DECLARE
@1 integer = 320,
@2 integer = 325,
@3 integer = 330;
SELECT
P.[Name]
FROM Production.Product AS P
WHERE
P.ProductID IN (@1,@2,@3);
Now we have three ranges: One for each of the variables in the original query.
The Compute Scalar operators again define three columns containing the start and end of the range, and the associated informational flags (previously seen as a result of the LikeRangeInfo
function).
This time, we see the decimal representation of these flags, which happens to be 62 for an equality comparison.
The IN
expands to ProductID = @1 OR ProductID = @2 OR ProductID = @3
, so each of the ‘ranges’ here is in fact a single value. For this example, the start and end range values are therefore the same in each Compute Scalar.
The three dynamic ranges are concatenated, and sorted so any overlapping ranges appear next to each other. The Merge Interval collapses these intervals into one or more disjoint (non-overlapping) ranges.
This is important, because the three variables might, for example, all contain the same value, and it would be incorrect to return that value three times.
For each disjoint range produced, the Nested Loops Join drives a new seek of the Clustered Index.
The overall effect is that an arbitrary number of possibly overlapping ranges are computed, merged, and then used to drive one or more seek operations.
The final result of the query will be the combination of all the seek results.
Hidden Conversions
The following example contains a table with datetime2
values, and a query with a expression that at first sight seems unlikely to be able to seek on an index. The variable is typed as date
, and there is a CONVERT
function applied to the datetime2
column:
DECLARE @Example table
(date_time datetime2 PRIMARY KEY);
INSERT @Example
(date_time)
VALUES
('20110101 12:34:56');
DECLARE
@date date = '2011-01-01';
SELECT *
FROM @Example AS E
WHERE
@date = CONVERT(date, E.date_time);
Nevertheless, an execution plan that uses a seek can be produced:
In this case, neither SSMS or Plan Explorer will show the contents of the Compute Scalar (this is probably just an oversight, rather than deliberate concealment!)
We have to open the XML form of the execution plan to see the three familiar expressions, wrapped in a ValueVector
(just a fancy container for multiple expressions).
Another internal function, GetRangeThroughConvert
, is responsible for determining the range of datetime2
values covered by the date
variable @date
, and the informational flags needed.
In the same way the engine works out covering ranges for some LIKE
predicates, this function determines ranges where certain problematic type conversions are required.
Otherwise, the machinery is the same: A range description is defined by the Compute Scalar, and the Nested Loops Join driving a seek using those values.
More Hidden Conversions
There is another related internal function used when the Query Processor needs to determine a range for a comparison between different data types.
This example returns rows based on a greater-than-or-equal comparison between date
column values and the datetime
return value of the GETDATE()
intrinsic function:
DECLARE @Example table
(col1 date PRIMARY KEY);
SELECT *
FROM @Example AS E
WHERE
E.col1 >= DATEADD(DAY, -7, GETDATE());
Again, the SSMS graphical plan and Plan Explorer cannot display the contents of the ValueVector
, so we have to dig into the XML again.
The function:
- Evaluates the
DATEADD(GETDATE())
expression - Computes the open-interval start point of a
date
range accounting for the conversion fromdatetime
todate
- Specifies
NULL
as the end of the range (since this is a>=
comparison, there is no end value).
The flags value in this case is 22 (the flags for a >=
seek operation).
Everything All At Once
This last example features all sorts of data type sloppiness, resulting in an execution plan that uses:
GetRangeThroughConvert
on the string expressionGetRangeThroughConvert
on the result ofGetRangeWithMismatchedTypes
applied to the result of theGETDATE
function.
The whole thing is then wrapped in a dynamic seek with the Merge Interval enforcing the (annoying) BETWEEN
requirement that the first parameter must be less than or equal to the second.
See if you can work out all the conversions necessary for this query, using the rules of data type precedence.
It is really quite impressive that this example of lazy T-SQL coding still results in an index seek, don’t you think?
DECLARE @Example table
(col1 datetime PRIMARY KEY);
SELECT *
FROM @Example AS E
WHERE
CONVERT(date, E.col1) BETWEEN '20000101' AND GETDATE();
Conclusion
SQL Server works quite hard sometimes to produce index seeks where they might seem unlikely. This is a good thing, and it would be great to see this capability extended further in future.
The downside is that this extra effort means you are less likely to see an Index Scan when you have done something daft with data types.
Why is this a bad thing if you get a seek anyway?
The problem is that these hidden implicit conversions can result in inaccurate cardinality and distribution estimates at any stage of the plan. So, even if you get a seek, the plan might be way off overall.
If that isn’t persuasive enough, consider this: Will having hidden nested range calculations improve your chances of getting a good query plan? Probably not, no.
Be very aware of data types in general, and in particular of the data types returned by system functions and T-SQL expressions.
If in doubt, use SELECT INTO
to materialize the result of an expression or query, and check the types of the columns produced.
- Note
- If you have any scripts that trawl the plan cache looking for implicit conversions (
CONVERT_IMPLICIT
), you might want to look into updating them to check for these conversions too. Remember that not all conversions are bad ones though.
Related Reading:
Implicit Conversions by Craig Freedman.
More on Implicit Conversions also by Craig Freedman.
Join Performance, Implicit Conversions, and Residuals by me.
Trigram Wildcard Search in SQL Server also by me.
© Paul White
email: SQLkiwi@gmail.com
twitter: @SQL_Kiwi
A case where the application of the residual predicate here seems to go awry! https://stackoverflow.com/a/74682250/73226
ReplyDeleteHi Martin,
DeleteOh dear 🤣 that's a little unfortunate isn't it.
It does seem like GetRangeThroughConvert doesn't handle the empty string edge case correctly. There's no boundary value just beyond, so it returns an invalid NULL result. It's not limited to an equality comparison either - any comparator exposes the bug (plus LIKE without a wildcard).
I was surprised this didn't throw an assert somewhere, since the non-NULL interval generators should never emit a range that includes NULL.
It doesn't seem like there would be an easy general fix for this. If Microsoft take note of this issue, they might simply disable SeekableCompareThroughConvert for the general string case, perhaps limited to the times when a literal isn't available at compilation time.
Cheers,
Paul