About This Blog

Including my content originally published on 𝕏, SQLperformance.com, and SQLblog.com

Tuesday 2 May 2017

SQL Server Temporary Object Caching

SQL Server Temporary Object Caching

Creating a table is a relatively resource-intensive and time-consuming operation. The server must locate and allocate storage space for the new data and index structures and make the corresponding entries in multiple system metadata tables. All this work has to be done in ways that will always work correctly under high concurrency, and which meet all of the ACID guarantees expected of a relational database.

In SQL Server, this means taking the right kinds of locks and latches, in the correct sequence, while also ensuring that detailed transaction log entries are safely committed to persistent storage in advance of any physical changes to the database. These log entries ensure the system can bring the database back to a consistent state in the event of a transaction rollback or system crash.

Dropping a table is a similarly expensive operation. Luckily, most databases do not create or drop tables with any great frequency. The obvious exception to this is the system database tempdb. This single database contains the physical storage, allocation structures, system metadata, and transaction log entries for all temporary tables and table variables across the entire SQL Server instance.

It is in the nature of temporary tables and table variables to be created and dropped much more frequently than other database object types. When this naturally high frequency of creation and destruction is combined with the concentrating effect of all temporary tables and table variables being associated with a single database, it is hardly surprising that contention can arise in the allocation and metadata structures of the tempdb database.

Temporary Object Caching

To reduce the impact on tempdb structures, SQL Server can cache temporary objects for reuse. Instead of dropping a temporary object, SQL Server retains the system metadata, and truncates the table data. If the table is 8MB or smaller, the truncation is performed synchronously, otherwise deferred drop is used. In either case, truncation reduces the storage requirement to a single (empty) data page, and the allocation information to a single IAM page.

Caching avoids almost all of the allocation and metadata costs of creating the temporary object next time around. As a side-effect of making fewer changes to the tempdb database than a full drop and recreate cycle, temporary object caching also reduces the amount of transaction logging required.

Achieving Caching

Table variables and local temporary tables are both capable of being cached. To qualify for caching, a local temporary table or table variable must be created in a module:

  • Stored procedure (including a temporary stored procedure)
  • Trigger
  • Multi-statement table-valued function
  • Scalar user-defined function

The return value of a multi-statement table-valued function is a table variable, which may itself be cached. Table-valued parameters (which are also table variables) can be cached when the parameter is sent from a client application, for example in .NET code using SqlDbType.Structured. When the statement is parameterized, table-valued parameter structures can only be cached on SQL Server 2012 or later.

The following cannot be cached:

  • Global temporary tables
  • Objects created using ad-hoc SQL
  • Objects created using dynamic SQL (e.g. using EXECUTE or sys.sp_executesql)

To be cached, a temporary object additionally must not:

  • Have named constraints (constraints without explicit names are perfectly fine)
  • Perform “DDL” after object creation
  • Be in a module defined using the WITH RECOMPILE option
  • Be called using the WITH RECOMPILE option of the EXECUTE statement

To address some common misconceptions explicitly:

  • TRUNCATE TABLE does not prevent caching
  • DROP TABLE does not prevent caching
  • UPDATE STATISTICS does not prevent caching
  • Automatic creation of statistics does not prevent caching
  • Manual CREATE STATISTICS will prevent caching

All temporary objects in a module are assessed for caching suitability separately. A module that contains one or more temporary objects that cannot be cached may still qualify for caching of other temporary objects within the same module.

A common pattern that disables caching for temporary tables is the creation of indexes after the initial table creation statement. In most cases this can be worked around using primary key and unique constraints. In SQL Server 2014 and later, we have the option of adding non-unique nonclustered indexes directly in the table creation statement using the INDEX clause.

Monitoring and Maintenance

We can see how many temporary objects are currently cached using the cache counters DMV:

SELECT
    DOMCC.[type],
    DOMCC.pages_kb,
    DOMCC.pages_in_use_kb,
    DOMCC.entries_count,
    DOMCC.entries_in_use_count
FROM sys.dm_os_memory_cache_counters AS DOMCC 
WHERE 
    DOMCC.[name] = N'Temporary Tables & Table Variables';

An example result is:

Temporary Tables & Table Variables cache counter

A cache entry is considered to be in use for as long as any part of the containing module is executing. Concurrent executions of the same module will result in multiple cached temporary objects being created. Multiple execution plans for the same module (perhaps due to differing session SET options) will also lead to multiple cache entries for the same module.

Cache entries can be aged out over time in response to competing needs for memory. Cached temporary objects can also be removed (asynchronously, by a background system thread) when the parent module’s execution plan is removed from the plan cache.

While not supported (or in any way recommended) for production systems, the temporary object cache store can be manually completely cleared for testing purposes with:

DBCC FREESYSTEMCACHE('Temporary Tables & Table Variables')
    WITH MARK_IN_USE_FOR_REMOVAL;
WAITFOR DELAY '00:00:05';

The five second delay allows time for the background clean up task to run. Note that this command is actually dangerous. You should only employ it (at your own risk) on a test instance you have exclusive access to. Once you have finished testing, restart the SQL Server instance.

Caching Implementation Details

Table variables are implemented using a ‘real’ user table in the tempdb database (though not a table we can query directly). The name of the associated table is “#” followed by the eight-digit hexadecimal representation of the object id. The following query shows the relationship:

-- A table variable
DECLARE @Z AS table (z integer NULL);

-- Corresponding sys.tables entry
SELECT
    T.[name],
    ObjIDFromName = 
        CONVERT(integer, 
            CONVERT(binary(4), RIGHT(T.[name], 8), 2)),
    T.[object_id],
    T.[type_desc],
    T.create_date,
    T.modify_date
FROM tempdb.sys.tables AS T 
WHERE
    T.[name] LIKE N'#[0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F]';

An example result is shown below. Notice how the object id calculated from the object name matches the actual object id:

Table variable backing store details

Running that script as ad-hoc SQL will produce a different tempdb object ID (and object name) on each execution (no caching). Placing the same script inside a module (e.g. a stored procedure) will allow the table variable to be cached (as long as dynamic SQL is not used), so that the object ID and name will be the same on each execution.

When the table variable is not cached, the underlying table is created and dropped each time. When temporary object caching is enabled, the table is truncated at the end of the module instead of being dropped. There are no changes to system metadata when a table variable is cached. The impact on allocation structures and transaction logging is limited to deleting the rows in the table and removing any excess data and allocation pages when the module ends.

Temporary Tables

When a temporary table is used instead of a table variable, the basic mechanism is essentially the same, with just a couple of extra renaming steps: When a temporary table is not cached, it is visible in tempdb with the familiar user-supplied name, followed by a bunch of underscores and the hexadecimal representation of the object id as a final suffix. The local temporary table remains until it is explicitly dropped, or until the scope in which it was created ends. For ad-hoc SQL, this means when the session disconnects from the server.

For a cached temporary table, the first time the module is run, the temporary table is created just as for the non-cached case. At the end of the module, instead of being dropped automatically (as the scope in which it was created ends), the temporary table is truncated and then renamed to the hexadecimal representation of the object ID (exactly as seen for the table variable). The next time the module runs, the cached table is renamed from the hexadecimal format to the user-supplied name (plus underscores plus hex object id).

The extra renaming operations at the start and end of the module involve a small number of system metadata changes. Cached temporary tables can therefore still experience at least some metadata contention under very high rates of reuse. Nevertheless, the metadata impact of a cached temporary table is much lower than for the non-cached case (creating and dropping the table each time).

More details and examples of how temporary object caching works can be found in my previous article.

Statistics on cached temporary tables

As mentioned earlier, statistics may be automatically created on temporary tables without losing the advantages of temporary object caching (as a reminder, manually creating statistics will disable caching).

An important caveat is that the statistics associated with a cached temporary table are not reset when the object is cached at the end of the module, or when the cached object is retrieved from cache at the start of the module. As a consequence, statistics on a cached temporary table may be left over from an unrelated prior execution. In other words, the statistics may bear absolutely no relation to the current contents of the temporary table.

This is obviously undesirable, given that the main reason to prefer a local temporary table over a table variable is the availability of accurate distribution statistics. In mitigation, the statistics will be automatically updated when (if) the accumulated number of changes to the underlying cached object reaches the internal Recompilation Threshold. This is difficult to assess in advance, because the details are complex and somewhat counterintuitive.

The most comprehensive workaround, while retaining the benefits of temporary object caching, is to:

  • Manually UPDATE STATISTICS on the temporary table within the module; and
  • Add an OPTION (RECOMPILE) hint to statements that reference the temporary table

Naturally, there is a cost involved in doing this, but this is most often acceptable. Indeed, by choosing to use a local temporary table in the first place, the module author is implicitly saying that plan selection is likely to be sensitive to the contents of the temporary table, so recompilation can make sense. Updating statistics manually ensures that the statistics used during the recompilation reflect the current contents of the table (as we would surely expect).

For more details on exactly how this works please refer to my previous article.

Summary and Recommendations

Temporary object caching within a module can greatly reduce the pressure on shared allocation and metadata structures in the tempdb database. The greatest reduction will occur when using table variables because caching and reusing these temporary objects does not involve modifying metadata at all (no renaming operations). Contention on allocation structures may still be seen if the single cached data page is insufficient to hold all the table variable’s data at runtime.

The impact on plan quality due to the lack of cardinality information for table variables may be mitigated by using OPTION(RECOMPILE) or trace flag 2453 (available from SQL Server 2012 onwards). Note that these mitigations only give the optimizer information about the total number of rows in the table.

To generalize, table variables are best used when the data is small (ideally fitting within a single data page for maximum contention benefits) and when plan selection does not depend on the values present in the table variable.

If information about the data distribution (density and histograms) is important for plan selection, use a local temporary table instead. Be sure to meet the conditions for temporary table caching, which most often means not creating indexes or statistics after the initial table creation statement. This is made more convenient from SQL Server 2014 onward due to the introduction of the INDEX clause of the CREATE TABLE statement.

An explicit UPDATE STATISTICS after data is loaded into the temporary table, and OPTION (RECOMPILE) hints on statements that reference the table may be needed to produce all the expected benefits of cached temporary tables within a module.

It is important to only use temporary objects when they produce a clear benefit, most often in terms of plan quality. Excessive, inefficient, or unnecessary use of temporary objects can lead to tempdb contention, even when temporary object caching is achieved.

Optimal temporary object caching may not be enough to reduce tempdb contention to acceptable levels in all cases, even where temporary objects are only used when fully justified. Using in-memory table variables or non-durable in-memory tables can provide targeted solutions in such cases, though there are always trade offs to be made, and no single solution currently represents the best option in all cases.

No comments:

Post a Comment

All comments are reviewed before publication.