About This Blog

Including my content originally published on 𝕏, SQLperformance.com, and SQLblog.com

Thursday, 18 November 2021

Be Careful with LOBs and OPTION (RECOMPILE)

Be Careful with LOBs and OPTION (RECOMPILE)

It sometimes makes sense to add OPTION (RECOMPILE) to a query. Typically this will be when:

  • A good enough plan for the query is very sensitive to one or more parameters
  • No good single value exists for the parameter to use in a hint
  • Optimize for unknown doesn’t give a good result
  • The plan might be expected to change over time
  • The cost of recompiling the statement is much less than the expected execution time
  • Recompiling every time is very likely to save more time and resources than it costs overall

All that is fairly well-known. The point of this short post is to draw your attention to another side-effect of adding OPTION (RECOMPILE) — the parameter embedding optimization (PEO).

When PEO is used, SQL Server takes the value of any variables and parameters and embeds the runtime values in the query text, pretty much as if you had entered them by hand before compiling. This is often very useful for plan quality, but there is a potential drawback when large object types (LOBs) are in play.

Example

Consider the following toy query, which creates a 512MB LOB string then returns the first character:

DECLARE 
    @x varchar(max),
    @y varchar(max),
    @lob varchar(max);

SET @x = 'x';
SET @y = 'y';
SET @lob = @y + REPLICATE(@x, 512 * 1024 * 1024);

SELECT LEFT(@lob, 1);

That executes in around 3 seconds, with the vast majority of the time spent constructing the large string.

And it is a large string, but not outlandishly so. It is not unheard of for people to read complete files or XML/JSON into a variable using OPENROWSET...BULK for example.

I will note in passing that LOB variables and parameters are not limited to 2GB. That is the storage limit for LOBs. You are free to construct a LOB of any size if you really want to hurt your server that way.

With a recompile hint

Add OPTION (RECOMPILE) to the previous query:

DECLARE 
    @x varchar(max),
    @y varchar(max),
    @lob varchar(max);

SET @x = 'x';
SET @y = 'y';
SET @lob = @y + REPLICATE(@x, 512 * 1024 * 1024);

SELECT
    LEFT(@lob, 1)
OPTION (RECOMPILE);

The SELECT now takes around 30 seconds.

This is a factor of ten slower. Why?

Explanation

Well there are a number of reasons. For one, embedding the value of @lob at runtime requires making at least one copy of the entire string, which is resource-intensive. There are other internal details like the time taken to compute a hash of the constant value when storing it in a memo group.

The broader point is that you probably wouldn’t embed a 512MB string literal in a query, submit it in a query, and expect good performance. Adding OPTION (RECOMPILE) is the equivalent of doing that.

Another relevant factor is the optimizer makes no general guarantees about the number of executions or exact timing of scalar expression evaluations. This means literals or expressions can be duplicated in the query tree, meaning an embedded LOB value can appear more than once and each instance will require a fresh copy of the large value. It is not always possible to anticipate when the optimizer will decide to do this.

The final thing I want to mention is constant folding. This means evaluating an expression at compile time, for example DATALENGTH(@lob) in a variation of the example above. Embedding the 512MB string is followed by evaluating DATALENGTH during compilation, which is itself an expensive operation.

Conclusion

I would argue that embedding large objects via PEO will rarely, if ever, pay dividends. SQL Server ought not do it.

In the meantime, ensure you only use OPTION (RECOMPILE) when it truly makes sense, and you have given it full consideration.


If you have a few minutes to kill, try running this variation with and without the recompile hint:

DECLARE 
    @x varchar(max),
    @y varchar(max),
    @lob varchar(max);

SET @x = 'x';
SET @y = 'y';
SET @lob = @y + REPLICATE(@x, 512 * 1024 * 1024);

SELECT
    LEFT(@lob, 1),
    LEFT(@lob, 1),
    LEFT(@lob, 1),
    LEFT(@lob, 1),
    LEFT(@lob, 1),
    LEFT(@lob, 1),
    LEFT(@lob, 1),
    LEFT(@lob, 1)
OPTION (RECOMPILE);

1 comment:

  1. This might even be a bug.

    Documentation says "An exception is made for large object types. If the output type of the folding process is a large object type (text,ntext, image, nvarchar(max), varchar(max), varbinary(max), or XML), then SQL Server does not fold the expression."

    https://learn.microsoft.com/en-us/sql/relational-databases/query-processing-architecture-guide?view=sql-server-ver16#foldable-expressions

    Then again, it might not be folding as such that's causing the issue, maybe it's just the fact that it's in the parse tree altogether (hashing etc).

    ReplyDelete

All comments are reviewed before publication.