About This Blog

Including my content originally published on 𝕏, SQLperformance.com, and SQLblog.com

Wednesday 28 August 2013

Parameter Sniffing, Embedding, and the RECOMPILE Options

Parameter Sniffing, Embedding, and the RECOMPILE Options

Parameter Sniffing

Query parameterization promotes the reuse of cached execution plans, thereby avoiding unnecessary compilations, and reducing the number of ad-hoc queries in the plan cache.

These are all good things, provided the query being parameterized really ought to use the same cached execution plan for different parameter values. An execution plan that is efficient for one parameter value may not be a good choice for other possible parameter values.

When parameter sniffing is enabled (the default), SQL Server chooses an execution plan based on the particular parameter values that exist at compilation time. The implicit assumption is that parameterized statements are most commonly executed with the most common parameter values. This sounds reasonable enough (even obvious) and indeed it often works well.

A problem can occur when an automatic recompilation of the cached plan occurs. A recompilation may be triggered for all sorts of reasons, for example because an index used by the cached plan has been dropped (a correctness recompilation) or because statistical information has changed (an optimality recompile).

Whatever the exact cause of the plan recompilation, there is a chance that an atypical value is being passed as a parameter at the time the new plan is generated. This can result in a new cached plan (based on the sniffed atypical parameter value) that is not good for the majority of executions for which it will be reused.

It is not easy to predict when a particular execution plan will be recompiled (for example, because statistics have changed sufficiently) resulting in a situation where a good-quality reusable plan can be suddenly replaced by a quite different plan optimized for atypical parameter values.

One such scenario occurs when the atypical value is highly selective, resulting in a plan optimized for a small number of rows. Such plans will often use single-threaded execution, nested loops joins, and lookups. Serious performance issues can arise when this plan is reused for different parameter values that generate a much larger number of rows.

Disabling Parameter Sniffing

Parameter sniffing can be disabled using documented trace flag 4136. The trace flag is also supported for per-query use via the QUERYTRACEON query hint. Both apply from SQL Server 2005 Service Pack 4 onward (and slightly earlier if you apply cumulative updates to Service Pack 3).

Starting with SQL Server 2016, parameter sniffing can also be disabled at the database level, using the PARAMETER_SNIFFING argument to ALTER DATABASE SCOPED CONFIGURATION.

When parameter sniffing is disabled, SQL Server uses average distribution statistics to choose an execution plan.

This also sounds like a reasonable approach (and can help avoid the situation where the plan is optimized for an unusually selective parameter value), but it is not a perfect strategy either: A plan optimized for an ‘average’ value might well end up being seriously sub-optimal for the commonly-seen parameter values.

Consider an execution plan that contains memory-consuming operators like sorts and hashes. Because memory is reserved before query execution starts, a parameterized plan based on average distribution values can spill to tempdb for common parameter values that produce more data than the optimizer expected.

Memory reservations cannot usually grow during query execution, regardless of how much free memory the server may have. Certain applications do benefit from turning parameter sniffing off (see this archive post by the Dynamics AX Performance Team for an example).

For most workloads, disabling parameter sniffing entirely is the wrong solution, and may even be a disaster. Parameter sniffing is a heuristic optimization: It works better than using average values on most systems, most of the time.

Query Hints

SQL Server provides a range of query hints and other options to tune the behaviour of parameter sniffing:

  • The OPTIMIZE FOR (@parameter = value) query hint builds a reusable plan based on a specific value.
  • OPTIMIZE FOR (@parameter UNKNOWN) uses average distribution statistics for a particular parameter.
  • OPTIMIZE FOR UNKNOWN uses average distribution for all parameters (same effect as trace flag 4136).
  • The WITH RECOMPILE stored procedure option compiles a fresh procedure plan for every execution.
  • The OPTION (RECOMPILE) query hint compiles a fresh plan for an individual statement.

The old technique of “parameter hiding” (assigning procedure parameters to local variables, and referencing the variables instead) has the same effect as specifying OPTIMIZE FOR UNKNOWN. It can be useful on instances earlier than SQL Server 2008 (the OPTIMIZE FOR hint was new for 2008).

It could be argued that every parameterized statement should be checked for sensitivity to parameter values, and either left alone (if the default behaviour works well) or explicitly hinted using one of the options above.

This is rarely done in practice, partly because performing a comprehensive analysis for all possible parameter values can be time-consuming and requires quite advanced skills. Most often, no such analysis is performed and parameter-sensitivity problems are addressed as and when they occur in production.

This lack of prior analysis is probably one main reason parameter sniffing has a poor reputation. It pays to be aware of the potential for problems to arise, and to perform at least a quick analysis on statements that are likely to cause performance problems when recompiled with an atypical parameter value.

What is a parameter?

Some would say that a SELECT statement referencing a local variable is a “parameterized statement” of sorts, but that is not the definition SQL Server uses.

A reasonable indication that a statement uses parameters can be found by looking at the plan properties (see the Parameters tab in Sentry One Plan Explorer. Or click the query plan root node in SSMS, open the Properties window, and expand the Parameter List node):

Parameter properties

The ‘compiled value’ shows the sniffed value of the parameter used to compile the cached plan. The ‘runtime value’ shows the value of the parameter on the particular execution captured in the plan.

Either of these properties may be blank or missing in different circumstances. If a query is not parameterized, the properties will simply all be missing.

Just because nothing is ever simple in SQL Server, there are situations where the parameter list can be populated, but the statement is still not parameterized. This can occur when SQL Server attempts simple parameterization (discussed later) but decides the attempt is “unsafe”. In that case, parameter markers will be present, but the execution plan is not in fact parameterized.

Sniffing is not just for Stored Procedures

Parameter sniffing also occurs when a batch is explicitly parameterized for reuse using sp_executesql.

For example:

EXECUTE sys.sp_executesql
    N'
    SELECT
        P.ProductID,
        P.Name,
        TotalQty = SUM(TH.Quantity)
    FROM Production.Product AS P
    JOIN Production.TransactionHistory AS TH
        ON TH.ProductID = P.ProductID
    WHERE
        P.Name LIKE @NameLike
    GROUP BY
        P.ProductID,
        P.Name;
    ',
    N'@NameLike nvarchar(50)',
    @NameLike = N'K%';

The optimizer chooses an execution plan based on the sniffed value of the @NameLike parameter. The parameter value “K%” is estimated to match very few rows in the Product table, so the optimizer chooses a nested loop join and key lookup strategy:

Nested Loops and Lookups

Parameter Compiled and Runtime Values

Executing the statement again with a parameter value of “[H-R]%” (which will match many more rows) reuses the cached parameterized plan:

EXECUTE sys.sp_executesql
    N'
    SELECT
        P.ProductID,
        P.Name,
        TotalQty = SUM(TH.Quantity)
    FROM Production.Product AS P
    JOIN Production.TransactionHistory AS TH
        ON TH.ProductID = P.ProductID
    WHERE
        P.Name LIKE @NameLike
    GROUP BY
        P.ProductID,
        P.Name;
    ',
    N'@NameLike nvarchar(50)',
    @NameLike = N'[H-R]%';

Reused parameterized plan

Parameter Information

The AdventureWorks sample database is too small to make this a performance disaster, but this plan is certainly not optimal for the second parameter value.

We can see the plan the optimizer would have chosen by clearing the plan cache and executing the second query again:

Recompiled plan

Parameter Information

With a larger number of matches expected, the optimizer determines that a hash join and hash aggregation are better strategies.

T-SQL Functions

Parameter sniffing occurs with T-SQL functions as well, though the way execution plans are generated can make this more difficult to see.

There are good reasons to avoid T-SQL scalar and multi-statement functions in general, so for educational purposes only, here is a T-SQL multi-statement table-valued function version of our test query:

CREATE FUNCTION dbo.F
    (@NameLike nvarchar(50))
RETURNS @Result TABLE
(
    ProductID   integer NOT NULL PRIMARY KEY,
    Name        nvarchar(50) NOT NULL,
    TotalQty    integer NOT NULL
)
WITH SCHEMABINDING
AS
BEGIN
    INSERT @Result
    SELECT
        P.ProductID,
        P.Name,
        TotalQty = SUM(TH.Quantity)
    FROM Production.Product AS P
    JOIN Production.TransactionHistory AS TH
        ON TH.ProductID = P.ProductID
    WHERE
        P.Name LIKE @NameLike
    GROUP BY
        P.ProductID,
        P.Name;

    RETURN;
END;

The following query uses the function to display information for product names starting with ‘K’:

SELECT
    Result.ProductID,
    Result.Name,
    Result.TotalQty
FROM dbo.F(N'K%') AS Result;

Seeing parameter sniffing with an embedded function is more difficult because SQL Server does not return a separate post-execution (actual) query plan for each function invocation. The function could be called many times within a single statement, and users would not be impressed if SSMS tried to display a million function call plans for a single query.

As a result of this design decision, the actual plan returned by SQL Server for our test query is not very helpful:

Function plan

Nevertheless, there are ways to see parameter sniffing in action with embedded functions. The method I have chosen to use here is to inspect the plan cache:

SELECT
    DEQS.plan_generation_num,
    DEQS.execution_count,
    DEQS.last_logical_reads,
    DEQS.last_elapsed_time,
    DEQS.last_rows,
    DEQP.query_plan
FROM sys.dm_exec_query_stats AS DEQS
CROSS APPLY sys.dm_exec_sql_text(DEQS.plan_handle) AS DEST
CROSS APPLY sys.dm_exec_query_plan(DEQS.plan_handle) AS DEQP
WHERE
    DEST.objectid = OBJECT_ID(N'dbo.F', N'TF');

Plan cache query output

This result shows that the function plan has been executed once, at a cost of 201 logical reads with 2891 microseconds elapsed time, and the most recent execution returned one row. The XML plan representation returned shows that the parameter value was sniffed:

Cached function plan

Parameter information

Now run the statement again, with a different parameter:

SELECT
    Result.ProductID,
    Result.Name,
    Result.TotalQty
FROM dbo.F(N'[H-R]%') AS Result;

The post-execution plan shows that 306 rows were returned by the function:

Function plan

The plan cache query shows the cached execution plan for the function has been reused (execution_count = 2):

Plan cache entry

It also shows a much higher number of logical reads, and a longer elapsed time compared with the previous run. This is consistent with reusing a nested loops and lookup plan, but to be completely sure, the post-execution function plan can be captured using Extended Events or the SQL Server Profiler tool:

Profiler Actual Plan

Parameter information

Because parameter sniffing applies to functions, these modules can suffer from the same unexpected changes in performance commonly associated with stored procedures.

For example, the first time a function is referenced, a plan might be cached that does not use parallelism. Subsequent executions with parameter values that would benefit from parallelism (but reuse the cached serial plan) will show unexpectedly poor performance.

This issue can be tricky to identify because SQL Server does not return separate post-execution plans for function calls as we have seen. Using Extended Events or Profiler to routinely capture post-execution plans can be extremely resource-intensive, so it often makes sense to use that technique in a very targeted fashion. The difficulties around debugging function parameter-sensitivity issues mean it is even more worthwhile doing an analysis (and coding defensively) before the function hits production.

Parameter-sniffing works exactly the same way with T-SQL scalar user-defined functions (unless in-lined, on SQL Server 2019 onward). In-line table-valued functions do not generate a separate execution plan for each invocation, because (as the name says) these are in-lined into the calling query before compilation.

Beware Sniffed NULLs

Clear the plan cache and request an estimated (pre-execution) plan for the test query:

SELECT
    Result.ProductID,
    Result.Name,
    Result.TotalQty
FROM dbo.F(N'K%') AS Result;

You will see two execution plans, the second of which is for the function call:

Estimated plan

A limitation of parameter sniffing with embedded functions in estimated plans means the parameter value is sniffed as NULL (not “K%”):

Parameter information

In versions of SQL Server before 2012, this plan (optimized for a NULL parameter) is cached for reuse. This is unfortunate, because NULL is unlikely to be a representative parameter value, and it was certainly not the value specified in the query.

SQL Server 2012 (and later) does not cache plans resulting from an “estimated plan” request, though it will still display a function plan optimized for a NULL parameter value at compilation time.

Simple and Forced Parameterization

An ad-hoc T-SQL statement containing constant literal values can be parameterized by SQL Server, either because the query qualifies for simple parameterization or because the database option for forced parameterization is enabled (or a plan guide is used to the same effect).

A statement parameterized in this way is also subject to parameter sniffing. The following query qualifies for simple parameterization:

SELECT 
    A.AddressLine1, 
    A.City, 
    A.PostalCode 
FROM Person.Address AS A 
WHERE 
    A.AddressLine1 = N'Heidestieg Straße 8664';

The estimated execution plan shows an estimate of 2.5 rows based on the sniffed parameter value:

Cardinality estimate

In fact, the query returns 7 rows (cardinality estimation is not perfect, even where values are sniffed):

Query result

At this point, you may be wondering where the evidence is that this query was parameterized, and the resulting parameter value sniffed. Run the query a second time with a different value:

SELECT 
    A.AddressLine1, 
    A.City, 
    A.PostalCode 
FROM Person.Address AS A 
WHERE 
    A.AddressLine1 = N'Winter der Böck 8550';

The query returns one row:

Query result

The execution plan shows the second execution reused the parameterized plan that was compiled using a sniffed value:

Trivial plan

Parameterization and sniffing are separate activities

An ad-hoc statement can be parameterized by SQL Server without parameter values being sniffed.

To demonstrate, we can use trace flag 4136 to disable parameter sniffing for a batch that will be parameterized by the server:

DBCC FREEPROCCACHE;
DBCC TRACEON (4136);
GO
SELECT
    A.AddressLine1, 
    A.City, 
    A.PostalCode 
FROM Person.Address AS A 
WHERE
    A.AddressLine1 = N'Heidestieg Straße 8664';
GO
SELECT 
    A.AddressLine1, 
    A.City, 
    A.PostalCode 
FROM Person.Address AS A 
WHERE 
    A.AddressLine1 = N'Winter der Böck 8550';
GO
DBCC TRACEOFF (4136);

The script results in statements that are parameterized, but the parameter value is not sniffed for cardinality estimation purposes. To see this, we can inspect the plan cache:

WITH XMLNAMESPACES
    (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
SELECT
    DECP.cacheobjtype,
    DECP.objtype,
    DECP.usecounts,
    DECP.plan_handle,
    parameterized_plan_handle =
        DEQP.query_plan.value
        (
            '(//StmtSimple)[1]/@ParameterizedPlanHandle',
            'NVARCHAR(100)'
        )
FROM sys.dm_exec_cached_plans AS DECP
CROSS APPLY sys.dm_exec_sql_text(DECP.plan_handle) AS DEST
CROSS APPLY sys.dm_exec_query_plan(DECP.plan_handle) AS DEQP
WHERE 
    DEST.[text] LIKE N'%AddressLine1%'
    AND DEST.[text] NOT LIKE N'%XMLNAMESPACES%';

The results show two cache entries for the ad-hoc queries, linked to the parameterized (prepared) query plan by the parameterized plan handle.

The parameterized plan is used twice:

Plan cache entries

The execution plan shows a different cardinality estimate now that parameter sniffing is disabled:

Cardinality estimate

Compare the estimate of 1.44571 rows with the 2.5 row estimate used when parameter sniffing was enabled.

With sniffing disabled, the estimate comes from average frequency information about the AddressLine1 column. An extract of the DBCC SHOW_STATISTICS output for the index in question shows how this number was calculated: Multiplying the number of rows in the table (19,614) by the density (7.370826e-5) gives the 1.44571 row estimate.

Statistics

Side note: It is commonly believed that only integer comparisons using a unique index can qualify for simple parameterization. I deliberately chose this example (a string comparison using a non-unique index) to refute that.

WITH RECOMPILE and OPTION (RECOMPILE)

When a parameter-sensitivity problem is encountered, a common piece of advice on forums and Q&A sites is to “use recompile” (assuming the other tuning options presented earlier are unsuitable). Unfortunately, that advice is often misinterpreted to mean adding WITH RECOMPILE option to the stored procedure.

Using WITH RECOMPILE effectively returns us to SQL Server 2000 behaviour, where the entire stored procedure is recompiled on every execution.

A better alternative, on SQL Server 2005 and later, is to use the OPTION (RECOMPILE) query hint on just the statement that suffers from the parameter-sniffing problem. This query hint results in a recompilation of the problematic statement only. Execution plans for other statements within the stored procedure are cached and reused as normal.

Using WITH RECOMPILE also means the compiled plan for the stored procedure is not cached. As a result, no performance information is maintained in DMVs such as sys.dm_exec_query_stats.

Using the query hint instead means that a compiled plan can be cached, and performance information is available in the DMVs (though it is limited to the most recent execution, for the affected statement only).

For instances running at least SQL Server 2008 build 2746 (Service Pack 1 with Cumulative Update 5), using OPTION (RECOMPILE) has another significant advantage over WITH RECOMPILE: Only OPTION (RECOMPILE) enables the Parameter Embedding Optimization.

The Parameter Embedding Optimization

Sniffing parameter values allows the optimizer to use the parameter value to derive cardinality estimates. Both WITH RECOMPILE and OPTION (RECOMPILE) result in query plans with estimates calculated from the actual parameter values on each execution.

The Parameter Embedding Optimization takes this process a step further. Query parameters are replaced with literal constant values during query parsing.

The parser is capable of surprisingly complex simplifications, and subsequent query optimization may refine things even further. Consider the following stored procedure, which features the WITH RECOMPILE option:

CREATE PROCEDURE dbo.P
    @NameLike nvarchar(50),
    @Sort tinyint
WITH RECOMPILE
AS
BEGIN
    SELECT TOP (5)
        ProductID,
        Name
    FROM Production.Product
    WHERE
        @NameLike IS NULL
        OR Name LIKE @NameLike
    ORDER BY
        CASE WHEN @Sort = 1 THEN ProductID ELSE NULL END ASC,
        CASE WHEN @Sort = 2 THEN ProductID ELSE NULL END DESC,
        CASE WHEN @Sort = 3 THEN Name ELSE NULL END ASC,
        CASE WHEN @Sort = 4 THEN Name ELSE NULL END DESC;
END;

The procedure is executed twice, with the following parameter values:

EXECUTE dbo.P
 @NameLike = N'K%',
 @Sort = 1;
GO
EXECUTE dbo.P
 @NameLike = N'[H-R]%',
 @Sort = 4;

Because WITH RECOMPILE is used, the procedure is fully recompiled on every execution. The parameter values are sniffed each time, and used by the optimizer to calculate cardinality estimates.

The plan for the first procedure execution is exactly correct, estimating 1 row:

K% sort 1

The second execution estimates 360 rows, very close to the 366 seen at run time:

H-R% sort 4

Both plans use the same general execution strategy: Scan all rows in an index, applying the WHERE clause predicate as a residual; compute the CASE expression used in the ORDER BY clause; and perform a Top N Sort on the result of the CASE expression.

OPTION (RECOMPILE)

Now re-create the stored procedure using an OPTION (RECOMPILE) query hint instead of WITH RECOMPILE:

CREATE PROCEDURE dbo.P
    @NameLike nvarchar(50),
    @Sort tinyint
AS
BEGIN
    SELECT TOP (5)
        ProductID,
        Name
    FROM Production.Product
    WHERE
        @NameLike IS NULL
        OR Name LIKE @NameLike
    ORDER BY
        CASE WHEN @Sort = 1 THEN ProductID ELSE NULL END ASC,
        CASE WHEN @Sort = 2 THEN ProductID ELSE NULL END DESC,
        CASE WHEN @Sort = 3 THEN Name ELSE NULL END ASC,
        CASE WHEN @Sort = 4 THEN Name ELSE NULL END DESC
    OPTION (RECOMPILE);
END;

Executing the stored procedure twice with the same parameter values as before produces dramatically different execution plans.

This is the first execution plan (with parameters requesting names starting with “K”, ordered by ProductID ascending):

Embedded parameter query 1

The parser embeds the parameter values in the query text, resulting in the following intermediate form:

SELECT TOP (5)
    ProductID,
    Name
FROM Production.Product
WHERE
    'K%' IS NULL
    OR Name LIKE 'K%'
ORDER BY
    CASE WHEN 1 = 1 THEN ProductID ELSE NULL END ASC,
    CASE WHEN 1 = 2 THEN ProductID ELSE NULL END DESC,
    CASE WHEN 1 = 3 THEN Name ELSE NULL END ASC,
    CASE WHEN 1 = 4 THEN Name ELSE NULL END DESC;

The parser then goes further, removing contradictions and fully evaluating the CASE expressions. This results in:

SELECT TOP (5)
    ProductID,
    Name
FROM Production.Product
WHERE
    Name LIKE 'K%'
ORDER BY
    ProductID ASC,
    NULL DESC,
    NULL ASC,
    NULL DESC;

You will get an error message if you try to submit that query directly to SQL Server, because ordering by a constant value is not allowed. Nevertheless, this is the form produced by the parser. It is allowed internally because it arose as a result of applying the parameter embedding optimization. The simplified query makes life a lot easier for the query optimizer:

Embedded parameter query 1

The Clustered Index Scan applies the LIKE predicate as a residual. The Compute Scalar provides the constant NULL values. The Top returns the first 5 rows in the order provided by the Clustered Index (avoiding a sort). In a perfect world, the query optimizer would also remove the Compute Scalar that defines the NULLs, as they are not used during query execution.

The second execution follows exactly the same process, resulting in a query plan (for names beginning with letters “H” to “R”, ordered by Name descending) like this:

Embedded parameter query 2

This plan features a Nonclustered Index Seek that covers the LIKE range, a residual LIKE predicate, the constant NULLs as before, and a Top (5). The query optimizer chooses to perform a BACKWARD range scan in the Index Seek to once again avoid sorting.

Compare the plan above with the one produced using WITH RECOMPILE, which cannot use the parameter embedding optimization:

WITH RECOMPILE

This demo example might have been better implemented as a series of IF statements in the procedure (one for each combination of parameter values). This could provide similar query plan benefits, without incurring a statement compilation each time. In more complex scenarios, the statement-level recompile with parameter embedding provided by OPTION (RECOMPILE) can be an extremely useful optimization technique.

An Embedding Restriction

There is one scenario where using OPTION (RECOMPILE) will not result in the parameter embedding optimization being applied. If the statement assigns to a variable, parameter values are not embedded:

CREATE PROCEDURE dbo.P
    @NameLike nvarchar(50),
    @Sort tinyint
AS
BEGIN
    DECLARE
        @ProductID integer,
        @Name nvarchar(50);

    SELECT TOP (1)
        @ProductID = ProductID,
        @Name = Name
    FROM Production.Product
    WHERE
        @NameLike IS NULL
        OR Name LIKE @NameLike
    ORDER BY
        CASE WHEN @Sort = 1 THEN ProductID ELSE NULL END ASC,
        CASE WHEN @Sort = 2 THEN ProductID ELSE NULL END DESC,
        CASE WHEN @Sort = 3 THEN Name ELSE NULL END ASC,
        CASE WHEN @Sort = 4 THEN Name ELSE NULL END DESC
    OPTION (RECOMPILE);
END;

Because the SELECT statement now assigns to a variable, the query plans produced are the same as when WITH RECOMPILE was used. Parameter values are still sniffed and used by the query optimizer for cardinality estimation, and OPTION (RECOMPILE) still only compiles the single statement, only the benefit of parameter embedding is lost.

No comments:

Post a Comment

All comments are reviewed before publication.