Since Query Store stores statistics long term, we can use it to look for trends or major changes in performance. The Query Variation component of the QDS Toolbox helps us review changes and has a lot of options to allow us to select the kinds of changes we are interested in.

So, let’s review our options and go through some examples.

Options

Let’s discuss the options and how they interact.

  • @ServerIdentifier: Defaults to the current instance. Set this to gather data from another instance of SQL Server.
  • @DatabaseName: Defaults to the current database. I’m querying data from WideWorldImporters for my examples.
  • @ReportIndex: Default NULL. When used, stores information about the parameters used for the report.
  • @ReportTable: Default NULL. Allows you to store the report data in a table like dbo.QueryVariationStore, created by the installer.
  • @Measurement: What statistic are we basing our report on? CPU by default. [CLR, CPU, DOP, Duration, Log, LogicalIOReads, LogicalIOWrites, MaxMemory, PhysicalIOReads, Rowcount, TempDB]
  • @Metric: How are we comparing our measurements? Avg by default. [Avg, Max, Min, StdDev, Total]
  • @VariationType: Are we looking for queries that have improved [I] or regressed [R]. ‘R’ by default.
  • @ResultsRowCount: Number of rows to include in the report. Default is 25.
  • @RecentStartTime and @RecentEndTime: Defines what the ‘recent’ period is, which will be compared against the historical. The defaults are 1 hour ago for @RecentStartTime, and now for @RecentEndTime.
  • @HistoryStartTime and @HistoryEndTime: Defines what the ‘history’ period is, which will be compared against the recent. The defaults are 30 days ago for @HistoryStartTime, and 1 hour ago for @HistoryEndTime.
  • @MinExecCount: Only considers queries with at least this many recent executions. Default is 1.
  • @MinPlanCount and @MaxPlanCount: Only includes queries with a number of plans in this range. Defaults are 1 and 99999.
  • @IncludeQueryText: Includes the text for any identified queries in the output. Default is 0.
  • @ExcludeAdhoc: Ignores ad hoc queries, anything that isn’t part of a procedure or other defined object. Default is 0.
  • @ExcludeInternal: Excludes internal queries\operations run by SQL Server itself. Default is 0.
  • @VerboseMode: Default 0. Provides the queries being used in the messages tab.
  • @TestMode: Default 0. When enabled, does everything except actually run the generated queries to create the report.

So let’s look at a few examples Pablo Lozano put in the header for the procedure.

Worst Regressions

The first example using the Query Variation proc creates a list of the 25 queries with the largest regression in performance for the last hour, compared to the history over the last 30 days.

	EXECUTE [dbo].[QueryVariation]
		@DatabaseName	= 'WideWorldImporters',
		@Measurement	= 'Duration',
		@Metric			= 'Avg';

We’re relying a lot on default values here, but it shows how simple it can be to run this procedure and get usable results.

Our results include queries that are in procedures and those that aren’t. We have the QueryID values for each if we want to run another query to look them up, but we didn’t include the option to get the text for the statements.

Since the default is to look for regressed queries, our list has 25 queries in descending order based on how large the regression was. We can see the average duration recently and historically, the number of executions is each window, and the overall number of plans.

The first two queries on this list from Integration.GetCustomerUpdates both have regressed by almost 300%. Both also have two execution plans. That could be the explanation for their regression; we could have a new plan that is worse for a variety of reasons. We also are comparing 8 recent executions to 100 in our history, so it could just be the small sample size (or cold cache) is making a big difference.

We have the information we need here to find the query and follow up from there. I’m fond of using the “Tracked Queries” report in SSMS under Query Store, and that would show us the query and it’s recent plans.

Biggest CPU Improvements

Here’s the second example from the procedure’s comments.

		EXECUTE [dbo].[QueryVariation]
			@DatabaseName	= 'WideWorldImporters',
			--@ReportIndex		= '[dbo].[QueryVariationIndex]',
			--@ReportTable		= '[dbo].[QueryVariationReport]',
			@Measurement		= 'CPU',
			@Metric				= 'Total',
			@VariationType		= 'I',
			@ResultsRowCount	= 10,
			@MaxPlanCount		= 1,
			@IncludeQueryText	= 1;

I’ve added my database name and left the report tables in comments, and I did change the option to include query text.

This report should show us queries that have improved the most, based on total CPU usage. It will only give us 10 results, and only includes query with a single plan.

Results here are showing us procedure with the biggest reduction in overall CPU usage, and several are around 90%. But, that’s not surprising since we measured off the total numbers instead of average, and we’ve run the queries far less in the recent window. If we run the query 92% less in the recent window, this will show up as a reduction.

If our executions were more even, it could draw attention to statements that have improved despite the plan not changing, since we set our @MaxPlanCount to 1.

The QuerySQLText has information on variables at the beginning, but we see which statement is being referred to by each line of the report.

Query Variation Usage

The Query Variation report is good for a general comparison of performance. Run this daily, check your average duration\CPU\logical IO reads against the historical to look for regressions, and you get a good list of queries and procs you may want to look into.

Do keep in mind that you can’t use this check the results of a change to a query. If you are changing the text of a statement in any way, the modified statement is a different query_id in Query Store. So this report wouldn’t compare it to the original statement.

This can identify queries that have a change in performance because of a new plan, depending on how you set the @MaxPlanCount. So if you are changing an index to improve a query, a query variation report could show the statement’s improvement (or regression).

Speaking at SQL Friday this week

If you are not aware of SQL Friday, there’s a talk done online live every week at noon CET (GMT +6). You can also view previous sessions through links here.

I’ll be speaking this week with a case study on optimizing a very frequently run procedure. It is the same subject I presented at Virtual PASS 2021.

I have a few more blogs coming on the QDS Dashboard, and a few other topics I’m likely to write on soon. If you have any topics related to performance in SQL Server you would like to hear more about, please feel free to make a suggestion. You can follow me on twitter (@sqljared) and contact me if you have questions.

In contrast to the previous QDS Toolbox components, this one is relatively simple. Instead of a tool that’s ready to be scheduled to give you a report, this just adds the dbo.query_store_wait_stats_pivoted view.

So, what’s the value here?

The typical case

The way I use Query Store most often, is to investigate the performance of a specific procedure or statement. I tend to do this when I know there is an issue, or when I want to evaluate the improvement of a new change.

Here’s a query I’ll often run (or some variation thereof):

--Statement numbers for the last two hours, no aggregation
SELECT 
	qrs.avg_duration * qrs.count_executions AS total_duration,
	qrs.count_executions, 
	qrs.avg_duration, 
	qsq.query_id,
	qsi.end_time,
	qt.query_sql_text, 
	qsq.query_hash
FROM sys.query_store_query qsq 
JOIN sys.query_store_query_text qt 
	ON qt.query_text_id = qsq.query_text_id
JOIN sys.query_store_plan qp 
	ON qp.query_id = qsq.query_id
JOIN sys.query_store_runtime_stats qrs 
	ON qp.plan_id = qrs.plan_id
JOIN sys.query_store_runtime_stats_interval qsi
	ON qsi.runtime_stats_interval_id = qrs.runtime_stats_interval_id
WHERE 
	qsq.object_id = OBJECT_ID('Integration.GetStockItemUpdates')
	AND qsi.end_time > DATEADD(hour, -2, GETUTCDATE())
	--AND qt.query_sql_text LIKE '%something%'
ORDER BY 
	qt.query_sql_text, 
	qsq.query_id,
	qsi.end_time

Now, Query Store’s performance statistics are separated into intervals. We can set the size for an interval when we turn on Query Store (60 minutes by default, I have it at 15 on my test environment). So, this query will show me the performance of each statement for each interval over the last two hours.

Sometimes I’ll aggregate the data so I can look at the averages over several days, but sometimes I like seeing each interval to see if there is a lot of variation.

So, here’s that result set:

Runtime Stats from Query Store

sys.query_store_wait_stats

But let’s say we look at this data and are concerned one of these statements is taking longer than expected, and we want to see if it is waiting on anything. So, we take that query and add a join to sys.query_store_wait_stats. Seems simple enough.

--Statement numbers for the last two hours, no aggregation, with wait stats for one statement
SELECT 
	qrs.avg_duration * qrs.count_executions AS total_duration,
	qrs.count_executions, 
	qrs.avg_duration, 
	qsq.query_id,
	qsi.end_time,
	qt.query_sql_text, 
	ws.wait_category_desc,
	ws.execution_type_desc,
	ws.total_query_wait_time_ms,
	ws.avg_query_wait_time_ms
FROM sys.query_store_query qsq 
JOIN sys.query_store_query_text qt 
	ON qt.query_text_id = qsq.query_text_id
JOIN sys.query_store_plan qp 
	ON qp.query_id = qsq.query_id
JOIN sys.query_store_runtime_stats qrs 
	ON qp.plan_id = qrs.plan_id
JOIN sys.query_store_runtime_stats_interval qsi
	ON qsi.runtime_stats_interval_id = qrs.runtime_stats_interval_id
LEFT JOIN sys.query_store_wait_stats ws
	ON ws.plan_id = qrs.plan_id
	AND ws.runtime_stats_interval_id = qsi.runtime_stats_interval_id
WHERE 
	qsq.object_id = OBJECT_ID('Integration.GetStockItemUpdates')
	AND qsi.end_time > DATEADD(day, -2, GETUTCDATE())
ORDER BY 
	qt.query_sql_text, 
	qsq.query_id,
	qsi.end_time,
	ws.wait_category_desc

And let’s see our results:

Runtime Stats with Wait Stats

Now we have 9 rows instead of 5. And the number under avg_duration look redundant.

What’s happened, is that sys.query_store_wait_stats has multiplied our result set. Since it stores a record for each wait type observed in each interval, we have multiple combined rows for several of these intervals. And one of our rows would have been removed from this set if we didn’t LEFT JOIN to sys.query_store_wait_stats, since it had no waits.

This is probably not how you would image it to work on first glance. And you might want to pivot those results instead, but the Pivoted Wait Stats view has done that for you.

Using Pivoted Wait Stats

If I take my original query and join to dbo.query_store_wait_stats_pivoted, I will get one row of output for each plan_id and runtime_stats_interval_id.

And the result set will be rather wide. There’s a total and average column for each wait type in sys.query_store_wait_stats. You may or may not want to include all in your query.

So, here’s an example with the average for several interesting wait types:

--Statement numbers for the last two hours, no aggregation
SELECT 
	qrs.avg_duration * qrs.count_executions AS total_duration,
	qrs.count_executions, 
	qrs.avg_duration, 
	qsq.query_id,
	qsi.end_time,
	qt.query_sql_text, 
	wsp.Average_BufferIO,
	wsp.Average_BufferLatch,
	wsp.Average_CPU,
	wsp.Average_Lock,
	wsp.Average_Memory,
	wsp.Average_NetworkIO,
	wsp.Average_Parallelism,
	wsp.Average_Tracing
FROM sys.query_store_query qsq 
JOIN sys.query_store_query_text qt 
	ON qt.query_text_id = qsq.query_text_id
JOIN sys.query_store_plan qp 
	ON qp.query_id = qsq.query_id
JOIN sys.query_store_runtime_stats qrs 
	ON qp.plan_id = qrs.plan_id
JOIN sys.query_store_runtime_stats_interval qsi
	ON qsi.runtime_stats_interval_id = qrs.runtime_stats_interval_id
LEFT JOIN dbo.query_store_wait_stats_pivoted wsp 
	ON wsp.plan_id = qp.plan_id
	AND wsp.runtime_stats_interval_id = qrs.runtime_stats_interval_id
WHERE 
	qsq.object_id = OBJECT_ID('Integration.GetSaleUpdates')
	AND qsi.end_time > DATEADD(day, -5, GETUTCDATE())
ORDER BY 
	qt.query_sql_text, 
	qsq.query_id,
	qsi.end_time

Runtime Stats with Pivoted Wait Stats

We have one record for this query for each of three intervals. In each row, I can see the average wait time for CPU, disk (BufferIO), buffer latch, and other waits. In this case, the largest waits were with NetworkIO. Likely, SSMS was having trouble keeping up with displaying multiple result sets, because I ran this batch 50 times in a row.

One last thing, keep in mind the times from the main tables in Query Store like sys.query_store_runtime_stats are reported in microseconds, while the wait tables are in milliseconds whether you use sys.query_store_wait_stats or dbo.query_store_wait_stats_pivoted. A query that takes 7,000,000 microseconds to run but waited for CPU time for 6,900 milliseconds spent nearly its entire duration waiting.

Don’t forget about PASS Data Community Summit 2021!

PASS Data Community Summit 2021 is next week, and the event is virtual and free, so please go and register if you haven’t already. I’m happy to see PASS and the summit are Red Gate’s hands, and expect it to be a great event.

Many of the sessions, including my session, are prerecorded and can be watched at any time. This should make it easier to get around conflicts between sessions you want to hear. There are also Q&A sessions for the prerecorded sessions, and those are scheduled and live. So get registered and make sure you don’t miss out.

If you have any topics related to performance in SQL Server you would like to hear more about, please feel free to make a suggestion. You can follow me on twitter (@sqljared) and contact me if you have questions.

When configuring Query Store, we have a few options for deciding how it retains data but little control over how it cleans up that data. We can set the max size of our query store, the max number of plans to keep per query, and how long to keep query statistics.

The QDS Cache Cleanup component of the QDS Toolbox gives us a number of other options for what data to remove:

  • Internal queries
  • Adhoc queries
  • Orphaned queries (from deleted stored procedure and other dropped objects)
  • Queries that have not run recently
  • Target queries with fewer than X executions
  • Remove only stats associated with targeted queries

Performance

There was a performance issue where I work that related to the QDS size-based cleanup that Mark Wilkinson discusses here. There were a number of symptoms and issues seen at the time and hats off to our DBAs for determining the root cause. This is something I wanted to highlight.

These issues didn’t start until QDS started hitting max size, so that was over a month in some cases, and it didn’t hit all instances and databases at the same time. This means the issue seemed “random” when it was happening.

Mark Wilkinson

Using the QDS Cache Cleanup, you can prevent the typical size-based cleanup from running, while having more control on what gets cleaned up. The procedure provided will identify which queries or stats in Query Store to remove, then calls system procedures to remove them:

  • sp_query_store_unforce_plan
  • sp_query_store_remove_query
  • sp_query_store_reset_exec_stats

Options

Let’s discuss the options and how they interact.

  • @InstanceIdentifier: You can use this to gather data from another instance of SQL Server.
  • @DatabaseName: Defaults to the current database. I keep my QDS data in a separate database, so I will use it in my examples.
  • @CleanAdhocStale: Binary option; default is 0. Setting this to 1 will clean up data related to any stale queries that are ad-hoc queries, i.e. not associated with an object like a stored procedure. Only stale ad-hoc queries are affected, as defined by the @Retention and @MinExecutionCount.
  • @CleanStale: Binary option; default is 1. Cleans up all stale queries. Also depends on the settings for @Retention and @MinExecutionCount.
  • @Retention: This setting helps define which queries are considered “stale”. Takes in a number of hours since the query was last run. The default is 168 hours (a week), meaning any query that had run in the last week would not count as stale. If you set this to 0, it will include all queries that match the @CleanStale/@CleanAdhocStale options.
  • @MinExecutionCount: Queries with fewer than this many executions are stale. Default is 2. So, if we take both default options, this will clean up queries with less than 2 executions and anything that has not run in the last week. Setting this to 0, again, would include all queries or all ad-hoc queries.
  • @CleanInternal: Binary option; default is 1. Cleans up any internal queries. This includes queries being run by SQL Server itself to do things like update statistics, and is based on a filter on sys.query_store_query.is_internal_query. This isn’t a “stale” option, so @Retention and @MinExecutionCount don’t affect this.
  • @CleanOrphan: Binary option; default is 1. Cleans up any queries that are associated with an object that no longer exists.
  • @CleanStatsOnly: Binary option; default is 0. When enabled, this option doesn’t remove the query and plan, but uses sp_query_store_reset_exec_stats to remove the statistics for any targeted queries. When not enabled, the default behavior unforces any forced plans with sp_query_store_unforce_plan, and then removes the query, its plans, and its execution statistics with sp_query_store_remove_query. Applies to any query included by the above options.
  • @ReportAsText and @ReportAsTable: Both default to 0. You can enable either or both to get details on the amount of space recovered from the cleanup, in whichever format you prefer.
  • @ReportIndexOutputTable: Default NULL. Setting this allows you to store the report data in a table like dbo.QDSCacheCleanupIndex, created during setup.
  • @ReportDetailsAsTable: Default 0. When enabled, returns details about each query being deleted from Query Store.
  • @ReportDetailsOutputTable: Default NULL. allows you to store the report details in a table like dbo.QDSCleanSummary, created during setup.
  • @TestMode: Default 0. Doesn’t actually delete data, but provides output as though it does. You could use this to see what the effect of a given set or parameters would be before actually taking a destructive action.
  • @VerboseMode: Default 0. Provides the queries being used in the messages tab.

One point for clarity, if the @Retention or @MinExecutionCount is 0,

QDS Cache Cleanup examples:

EXECUTE [dbo].[QDSCacheCleanup]
	@DatabaseName = 'WideWorldImporters'
	,@CleanAdhocStale = 0
	,@CleanStale = 1
	,@Retention = 24
	,@MinExecutionCount = 2
	,@CleanOrphan = 1
	,@CleanInternal = 1
	,@ReportAsTable = 1
	,@ReportDetailsAsTable = 1
	,@TestMode = 1;
GO

This is a example execution given in the comments of the procedure. Since this is run in test mode, nothing is actually deleted; the reports provided give information on what would be deleted if we ran this process normally.

CleanAdhocStale is not used, but CleanStale is a superset of it; so stale ad-hoc queries are included with all stale queries. Anything that has been executed at least twice in the last day is not considered stale.

This process does include any queries from dropped objects and any internal queries, regardless of when they were executed.

Report Table

The report from the QDS Cache Cleanup gives one line per type of query affected. The output includes when this was generated and where, along with the count of queries and plans cleaned up and the space that would be recovered.

Report Details Table

The report details shows the object name (where possible), Query ID, LastExecutionTime, ExecutionCount and QueryText for all queries included in the cleanup.

In my case, it found queries from a procedure I was testing recently (though I had to up the execution count so there would be some stale queries). Most of what the QDS Cache Cleanup flagged for deletion were internal queries involved in updating stats. But nothing was removed, since this was still in TestMode.

There were also some stale ad-hoc queries that look like statistics activity (you can see StatMan in the QueryText). These are internal queries but had executed few times and not recently, so they were also flagged as stale.

This raises an important point. Based on the @Retention and @MinExecutionCount options, orphaned or internal queries can count as stale and be included in your cleanup. If either setting is 0, all queries will be included by the QDS Cache Cleanup; essentially a full wipe of Query Store.

Here’s a few more examples with comments:

USE QDSToolBox
GO
	-- Test Mode; no deletion
	-- Includes Stale queries (24 hours or < 20 executions)
	-- Also includes orphaned and internal queries
	-- Provides report, details, and verbose output (Messages tab)
EXECUTE [dbo].[QDSCacheCleanup]
	@DatabaseName = 'WideWorldImporters'
	,@Retention = 24
	,@MinExecutionCount = 20
	,@CleanStale = 1
	,@CleanAdhocStale = 0
	,@CleanOrphan = 1
	,@CleanInternal = 1
	,@ReportAsTable = 1
	,@ReportDetailsAsTable = 1
	,@TestMode = 1
	,@VerboseMode = 1;
GO

	-- Test Mode; no deletion
	-- Only includes orphaned and internal queries
	-- Provides report, details, and verbose output (Messages tab)
EXECUTE [dbo].[QDSCacheCleanup]
	@DatabaseName = 'WideWorldImporters'
	,@CleanStale = 0
	,@CleanOrphan = 1
	,@CleanInternal = 1
	,@ReportAsTable = 1
	,@ReportDetailsAsTable = 1
	,@TestMode = 1
	,@VerboseMode = 1;
GO

	-- Test Mode; no deletion
	-- Disables most default options
	-- Only includes ad-hoc queries only executed once
	--		and not run in the last 24 hours.
	-- Provides report, details, and verbose output (Messages tab)
EXECUTE [dbo].[QDSCacheCleanup]
	@DatabaseName = 'WideWorldImporters'
	,@Retention = 24
	,@MinExecutionCount = 2
	,@CleanStale = 0
	,@CleanAdhocStale = 1
	,@CleanOrphan = 0
	,@CleanInternal = 0
	,@ReportAsTable = 1
	,@ReportDetailsAsTable = 1
	,@TestMode = 1
	,@VerboseMode = 1;
GO

	-- Test Mode; no deletion
	-- @CleanStale = 1, other clean uptions disabled
	-- Setting @Retention or @MinExecutionCount to 0 means all queries are stale
	-- Output table options included but commented.
	-- Provides report, details, and verbose output (Messages tab)
DECLARE
	@ReportID BIGINT;

	EXECUTE [dbo].[QDSCacheCleanup]
		@DatabaseName = 'WideWorldImporters',
		@CleanAdhocStale = 0,
		@CleanStale = 1,
		@Retention = 0,			--	All Queries Stale
		@MinExecutionCount = 0,	--	All Queries Stale
		@CleanOrphan = 0,
		@CleanInternal = 0,
		@CleanStatsOnly	= 0,
		@ReportAsText = 1,
		@ReportAsTable = 1,
		@ReportDetailsAsTable = 1,
		--@ReportIndexOutputTable	= '[dbo].[QDSCacheCleanupIndex]',
		--@ReportDetailsOutputTable	= '[dbo].[QDSCacheCleanupDetails]',
		@TestMode = 1,
		@VerboseMode = 1,
		@ReportID = @ReportID OUTPUT;

SELECT @ReportID;
GO

More to come

Some of the options in the QDS Cache Cleanup didn’t function like I expected at first, so I think this post should be helpful. I’ll continue to post on the QDS Dashboard, but I’ll likely include a few on other topics in the weeks to come.

I will be speaking at PASS Summit, which is free and virtual this year, so please sign up if you haven’t already.

If you have any topics related to performance in SQL Server you would like to hear more about, please feel free to make a suggestion. You can follow me on twitter (@sqljared) and contact me if you have questions.

The QDS Toolbox

The QDS Toolbox is set of tools that can help you review and store the performance related data in Query Store. This was released by ChannelAdvisor last September thanks to the hard work of a number of my coworkers.

If you aren’t experienced with Query Store, this can provide a good starting point for getting familiar with data that is available and what you can do with it. If you are experienced with Query Store, this may give you an easy way to set up customizable reports that help you find issues and see trends.

The QDS Toolbox has several components, and I intend to post about each in turn. Two new components were added to this recently by @sqlozano (https://www.sqlozano.com/), bringing the current total to eight.

  1. Server Top Queries
  2. Query Waits
  3. QDS Cache Cleanup
  4. Pivoted Waits Stats
  5. Query Variation
  6. Waits Variation
  7. Statistics Used
  8. Plan Miner

Getting Started

First things first, let’s get it downloaded and installed. I’ve linked the github page above. You can download a ZIP of the current package, or you can clone it with GitHub Desktop or a similar tool.

Once you have the package local, there’s an Installer folder. You could install them a la carte, or the QDSToolBox_Installer script will install all of the components. It will prompt you for the instance of SQL Server you want to use as well as the database you want to install it in. I’m putting mine in it’s own database, as I frequently restore other databases (WideWorldImporters, AdventureWorks2014) when I’m testing things.

Moving to Query Store

In my own experience with performance tuning with SQL Server, I started off using Profiler and PSSDiag constantly when I worked for Microsoft. After a number of years, I moved a lot of my focus to using queries against DMVs (Dynamic Management Views). DMVs allowed me to get most of the same data I reviewed from PSSDiag, but I’m able to get that data with a query that takes a few seconds (typically) instead of having to gather a trace for 30 minutes to feel like I have enough data.

By focusing on a DMV like sys.dm_exec_query_stats, you can easily find which queries on your server have the highest CPU usage, duration, or logical reads. This can make if very easy to identify a problem query, and you can find the query even if it isn’t running currently.

The caveat is that the DMVs only track what is in the cache, and once a query’s plan is no longer cached, it’s gone. No historical data is kept in the DMVs, and that’s why I use Query Store almost exclusively these days.

Server Top Queries

This is a great place to start exploring what is available in the QDS Toolbox, because looking for the top resource consuming queries is a common task.

Once you have the tool installed, you can run the dbo.ServerTopQueries to generate reports based on the metric you choose. That report will remain in the database where the QDS Toolbox was installed, and you can review them whenever. Here’s an example execution of the procedure:

USE QDSToolBox
GO
DECLARE 
	@StartTime DATETIME2,
	@EndTime DATETIME2;

SELECT
	@StartTime = DATEADD(DAY,-1,GETUTCDATE()),
	@EndTime = GETUTCDATE();

EXEC dbo.ServerTopQueries
	@ServerIdentifier	= @@SERVERNAME,
	@DatabaseName	= 'WideWorldImporters',
	@ReportIndex	= 'dbo.ServerTopQueriesIndex',	--provide both Report options to store results
	@ReportTable	= 'dbo.ServerTopQueriesStore',	--provide both Report options to store results
	@StartTime		= @StartTime,
	@EndTime		= @EndTime,
	@Top			= 25,
	@Measurement	= 'cpu_time',	--duration, cpu_time, logical_io_reads, logical_io_writes, 
									--physical_io_reads, clr_time, query_used_memory, log_bytes_used, tempdb_space_used
	@IncludeQueryText	= 1, --default: 0
	@ExcludeAdhoc		= 0,
	@ExcludeInternal	= 0,
	@VerboseMode		= 1,
	@TestMode			= 0;

GO

DECLARE
	@LatestReport INT;

SELECT TOP 1
	@LatestReport = tqi.ReportID
FROM dbo.vServerTopQueriesIndex tqi
ORDER BY 
	tqi.CaptureDate DESC;

SELECT *
FROM dbo.vServerTopQueriesStore tqs
WHERE
	tqs.ReportID = @LatestReport
ORDER BY 
	tqs.CPU DESC;
GO

This script will run in the QDSToolbox database and store the report there, but we will be gathering the Query Store data on activity in the WideWorldImporters databse. I’ve just run some scripts there to generate activity.

The @StartTime and @EndTime parameters will restrict us to looking at activity in the last day.

The @ReportIndex and @ReportTable parameter define where we will store the data. I’ve left these at the default names for the tables, as defined by the tool.

The report is going to be focusing on high cpu queries, defined by the @Measurement variable being set to ‘cpu_time’. If I were using this, I’d be very likely to use ‘duration’ and ‘logical_io_reads’ often as a matter of course as well.

The @Top parameter is set to 25, and that’s probably reasonable. If I’m troubleshooting an active problem, I’d be unlikely to look at more than the top 5 queries in whatever metric, but a larger view makes sense for historical purposes.

I’m taking the default option for @ExcludeAdhoc and @ExcludeInternal; if it’s at the top of my CPU usage, I want to see it.

I am setting @IncludeQueryText to 1 so I get the query_store_query_text.query_sql_text details in my report. That may help you identify a specific query. Even without this option the query_store_query.query_text_id will be available, so you could look up the exact query text directly in Query Store.

I’ve also enabled @VerboseMode to see the exact statement used to generate the report. If you are less familiar with Query Store, you might want to review this to see where all the data being used here is found.

With all this, generating the report took less than a second. Certainly you could set up a scheduled take to create a report on CPU activity every day, which you could review\aggregate\splice later. You could create tasks to do the same for ‘duration’ and ‘logical_io_reads’ and have a good set of data to review for potential issues.

Reviewing the report

Here’s a short script to view my latest report and all the data in the dbo.vServerTopQueriesStore view:

SELECT TOP 1
	@LatestReport = tqi.ReportID
FROM dbo.vServerTopQueriesIndex tqi
ORDER BY 
	tqi.CaptureDate DESC;

SELECT *
FROM dbo.vServerTopQueriesStore tqs
WHERE
	tqs.ReportID = @LatestReport
ORDER BY 
	tqs.CPU DESC;
GO

Report Output

Report Details

The result set includes the capture date for the report, which is the same for each row. We also have the server name, database name, and metric used for the report.

The PlanID, QueryID, and QueryTextID are values you can use to get more information directly from Query Store. I frequently use the QueryID in particular with the Tracked Queries interface in SQL Server Management Studio to look at the plan for a query I’ve already identified.

Object Details

We will see the ObjectName, ObjectID, and SchemaName for any queries that are part of a procedure, function, or other object.

The ExecutionTypeDesc here indicates these queries ran successfully. This value could also be ‘Aborted’ (for a timeout) or ‘Exception’ (for an error).

Performance Statistics

And here we have all of our performance statistics along with the query text. I most often look at three of these, but this includes memory usage (in pages), tempdb usage (also in pages), and log bytes.

The execution count is included, so you can use this to calculate averages for any of these measurements. I tend to focus on the queries that have the highest numbers overall, but you may find a query farther down on your list with a longer average duration is more important.

Coming up

These reports will be used as well by other aspects of the QDS Toolkit, so understanding how to create the reports themselves will be necessary for those.

As I’ve said, there are 7 more parts to this tool. I’m planning to do a post on each in the weeks (and months) ahead. I’ll also be doing more “foundation” posts like my last post on key lookups.

I hope you find this post helpful. If you have any topics related to performance in SQL Server you would like to hear more about, please feel free to make a suggestion.

You can follow me on twitter (@sqljared) and contact me if you have questions.

USE WideWorldImporters
GO
SELECT TOP 100 
	sol.OrderID, 
	sol.UnitPrice,
	sol.Quantity,
	sol.PickedQuantity,
	sol.LastEditedWhen
FROM Sales.OrderLines sol 
WHERE 
	sol.StockItemID = 20
GO

Key Lookups

In working on my presentation for Data Saturday #8 – Southwest US, I hadn’t realized how many topics come up at least briefly in the talk. I wanted to make a few posts about to go into details on each of these topics and why they are important.

My thanks again to Deborah Melkin for her review and feedback of the presentation.

A key lookup is an operation that occurs when a query has used a nonclustered index on a given table, but needs to access more columns to complete the query. It may need to check columns not in that index for additional filters, or it may just need to return that column as part of its result set.

In the simple query above, we’re retrieving 100 rows from the seek against a nonclustered index, then performing a key lookup against the clustered index. There is a nested loops operator between the two and understanding how that operates is important; for each row we receive from the first table, we perform the second operation once. So, in this query we are seeking 100 rows from the nonclustered index, then performing the key lookup 100 times. We go through the index once for each row we return, and you can see the cost of the key lookup operator is 99% of the query.

Operator Details

Details for the Key Lookup operator

Mouseover

If we mouseover the key lookup, we can see the details of this operation. We actually read 100 rows . The “Estimated Operator Cost” (0.324977) is nearly 100 times that of the index seek (0.0035899).

The “Number of Executions” is 100, so for each row received from the index seek, we traverse the clustered index (its index and leaf pages) once to get that row. And we do 100 separate seeks of that index to get 100 rows. This is a lot more work than we did to get 100 rows with 1 index seek from the nonclustered index.

The estimates match our actuals, but the TOP clause is a very good hint for how many rows we should receive.

If you have a table scan somewhere in your plan is table scanning millions of rows, you should probably address that first. But removing the key lookup by returning fewer columns drops this query from 12.5 milliseconds to 73 microseconds. That’s a 94.16% duration reduction (thank you Query Store).

Resolution

There’s two ways to handle a query like this with a key lookup.

  1. Do we need these columns in our query?
  2. Create a covering index.

Addition by Subtraction

We are doing the key lookup because we want to return columns, or filter\otherwise use columns, that are not in the nonclustered index. Let’s first ask this: do we need these columns in our query?

If we check the code or application that’s retrieving the results, does it actually consume those columns from the result set and use them? If we are filtering on that column, does that filter still make sense? If not, let’s just take it out of the query to simplify matters.

And it is very clear which columns are the issue. If you look at the details of the key lookup in the image above, the Output List for that operator shows which columns we are using the clustered index to retrieve. If you don’t need any of them, you can remove them from the query. Your new execution plan will be missing a key lookup.

CREATE COVERING INDEX

The heading is a joke; there’s no such command, of course. A covering index is a nonclustered index that supplies all the information you need from a given table to complete a given query. So far, we’re doing key lookups for this query because no such an index exists. We could get all these columns from the clustered index, but we would have to scan the whole index because our WHERE clause doesn’t match the sorting of the clustered index.

Normally when we create an index, we want our index to include any columns we are filtering on. So it would include columns in our WHERE clause, or the columns in our JOIN clause if we are joining from another table. In some cases, you might want the index to match an ORDER BY. Here just the section in red.

For a covering index on this query, we need to include the SELECT list (in the green section) in our index. In general, every column for this table referenced in the query needs to be in our index.

The INCLUDE column is a great way to add in the columns in our SELECT list.

We could add those 5 columns to our index normally as key values, but that would unnecessarily bloat all the pages of the index. We aren’t filtering on any of those columns, so we don’t need the columns in the index pages for us to filter properly. If we use the INCLUDE clause, these columns will be present only in the leaf page of our index. This is similar to how the columns from the clustered index are added to all nonclustered indexes.

So a script for the new index would look like this:

CREATE NONCLUSTERED INDEX [IX_Sales_OrderLines_AllocatedStockItems] ON [Sales].[OrderLines](	
	[StockItemID] ASC
) INCLUDE(
	[PickedQuantity],
	[OrderID],
	[UnitPrice],
	[Quantity],
	[LastEditedWhen]	
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = ON, ONLINE = ON, 
	ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [USERDATA]
GO

With the index in place, our original query took 95 microseconds. Slightly longer than the query with the reduced result set, but we did increase the size of the index some.

Conclusion

A key lookup might be an operation you don’t notice often, but I’ve been impressed with the result of removing them when I can.

I’ll be posting other blogs with foundational topics in the near future and more posts in general than I’ve had recently. Maybe this isn’t foundational; it might be on the first floor.

I hope you’ve learned something from this post. Please follow me on twitter (@sqljared) or contact me if you have questions.

So let’s talk about the best metric no one thinks about.

A Good Question

I was working with a client troubleshooting an issue several years back now, and they asked a question that was difficult to answer.

We were troubleshooting a slow query, and the reason it was taking so long was oddly opaque.

  • It wasn’t blocked
  • It wasn’t waiting on CPU or driving the CPU on the server
  • It wasn’t waiting on memory
  • It wasn’t waiting on the disks

So, the usual suspects were out. We didn’t have the plan for this query yet, and were operating without much information yet. So, the client asked the simple question, “If it isn’t waiting on anything, why isn’t it done?”

I thought it a simple question at first, then I really thought about it. With the usual suspects out, we really couldn’t see what SQL Server was doing. What could be taking up that time?

Invisible culprit

It finally occurred to me that just because we weren’t seeing the disks spike didn’t mean we weren’t doing a lot of reads. Logical reads don’t involve the disks; we’re only reading pages that are already in memory. They are faster than physical reads for sure, but that doesn’t mean they are instant.

The logical reads themselves won’t cause waits. They will use some CPU cycles, but in this case we saw no CPU related wait type or a high overall CPU %.

Take the plan above. It took 84 seconds to complete, and only returned 16 rows. But it read millions of rows at the lower levels. If the tables involved are largely in memory, we won’t be waiting on physical IO. It’s a SELECT statement, so it won’t be blocked or blocking if you are using read-committed snapshot isolation level (RCSI, I should blog about that).

With no waits or blocking, how would you see this query is less than optimal unless you are looking for it?

Detecting logical reads

We eventually got the execution plan and it confirmed my suspicion. The plan was reading many, many rows and joining across a number of tables. Once we had the plan, were able to come up with a plan to filter it down faster.

But the lesson here is the effects of logical reads are hard to detect. This makes logical reads a good metric to check for if you want to look at your activity within SQL Server. Here’s an example script for returning the queries with the highest logical reads in the execution cache, centering around sys.dm_exec_query_stats :

SELECT TOP 50 
	qs.creation_time, 
	qs.execution_count, 
	qs.total_logical_reads, 
	qs.total_logical_reads/qs.execution_count AS ave_logical_reads, 
	qs.total_elapsed_time, 
	qs.max_elapsed_time, 
	qs.total_elapsed_time/qs.execution_count as ave_duration, 
	substring(t.text, qs.statement_start_offset/2+1, 
		(CASE WHEN qs.statement_end_offset=-1 THEN (len(t.text)- qs.statement_start_offset)/2
		ELSE (qs.statement_end_offset- qs.statement_start_offset)/2 END )+1) AS statement_text,
	qp.query_plan,
	t.[text], 
	db_name(t.dbid) as db_name, 
	OBJECT_NAME( t.objectid, t.dbid) AS object_name, 
	qs.total_worker_time AS total_cpu_time, 
	qs.total_worker_time/qs.execution_count AS ave_cpu_time, 
	qs.total_physical_reads, 
	qs.total_physical_reads/qs.execution_count AS ave_physical_reads
	--,qs.plan_handle, qs.plan_generation_num
FROM sys.dm_exec_query_stats qs 
OUTER APPLY sys.dm_exec_sql_text(plan_handle) AS t
OUTER APPLY sys.dm_exec_query_plan(plan_handle) AS qp
ORDER BY qs.total_logical_reads DESC;
GO

This will let you look through the query stats for anything currently in the cache, but of course a restart of SQL Server or another action that clears the cache will mean there isn’t much to see.

If you use Query Store, you could see the queries with the most logical reads in the last two hours with the following:

SELECT TOP 10 
	SUM(rs.avg_logical_io_reads) AS sum_logical_io_reads,
	q.query_id,
	p.plan_id, 
	qt.query_sql_text, 
	p.query_plan,
	rsi.start_time
FROM sys.query_store_query_text AS qt
JOIN sys.query_store_query AS q
    ON qt.query_text_id = q.query_text_id
JOIN sys.query_store_plan AS p
    ON q.query_id = p.query_id
JOIN sys.query_store_runtime_stats AS rs
    ON p.plan_id = rs.plan_id
JOIN sys.query_store_runtime_stats_interval rsi
	ON rsi.runtime_stats_interval_id = 
		rs.runtime_stats_interval_id
WHERE 
	rsi.start_time > DATEADD(HOUR,-2,GETUTCDATE())
GROUP BY 
	q.query_id,
	p.plan_id, 
	qt.query_sql_text, 
	p.query_plan,
    rsi.start_time
ORDER BY SUM(rs.avg_logical_io_reads) DESC

Conclusion

Another point to consider, while physical reads take more time, the difference between the two is somewhat random. Whether given pages are in memory or not will affect how long a query takes, but we can’t really control what’s in memory without taking unusual measures.

Logical reads will always occur, and should be a more consistent measure of how much work a query requires.

Based on the plan, we may also end up reading to the same page in memory many times, for example if we have a cross product in our query.

I hope you find this post helpful.

If you have any topics related to performance in SQL Server you would like to hear more about, please feel free to @ me and make a suggestion.

Please follow me on twitter (@sqljared) or contact me if you have questions.

Happy Thanksgiving.

Hints in SQL Server

I used to be really suspicious of using hints in SQL Server, and now I can’t imagine working without them.

My opinion on this topic changed over the last few years due to a number of the performance issues I’ve worked on. I spoke at SQLSaturday 1000 (Oregon 2020) last weekend, and my talk was primarily about things I learned optimizing garbage collection and similar incremental processes. During that work I ran into a number of issues with queries like this example from the WideWorldImporters database:

	DELETE inv
	FROM @OrdersGC gc
	JOIN Sales.Invoices inv
		ON inv.OrderID = gc.OrderID;

Order Matters

The logic here is simple enough. Earlier in the process, we found orders we wanted to delete per retention policy, and put the OrderID values in a memory optimized table variable. We then use the motv to delete from all related tables, and finally the Orders table.

This query doesn’t have a WHERE clause. It’s plain to see how we want this to function though. We have 100 rows in our motv, and we want to delete the related rows in Invoices. But I’ve seen issues caused by execution plans that flip the order:

Table variables have no statistics, so the optimizer doesn’t know how many rows to expect from that operation (though table variable deferred compilation in SQL Server 2019 can resolve this) . Occasionally, I would see a plan with a join order that is the opposite of my expectation. The lack of a WHERE clause hurts here, but there’s no clause I can apply that will filter better than the items I already have in my table variable.

Consistency

I work on hundreds of databases with the same schema. They have different data sets and distributions, different sizes, and their statistics are going to update at different times. But if one of them chooses a bad plan, I have to push aside whatever other work to research the high CPU on database xyz.

Consistency is really valuable to me. And in this case, the answer is simple. Yes, I want to scan the fast, small memory-optimized table variable first, and use it to filter the larger, slower table. Adding a join hint or a force order to this query should keeps its plan and performance consistent.

	DELETE inv
	FROM @OrderList gc
	INNER LOOP JOIN Sales.Invoices inv
		ON inv.OrderID = gc.OrderID;

	DELETE inv
	FROM @OrderList gc
	JOIN Sales.Invoices inv
		ON inv.OrderID = gc.OrderID
	WITH OPTION(FORCE ORDER);

Both approaches force the join order. The INNER LOOP JOIN hint has the added benefit of ensuring the plan uses a nested loops join. A hash match wouldn’t be efficient with a batch size of a few hundred or a few thousand. A merge join would likely require a sort of one of the inputs, which defeats the purpose.

Index hints

I had to use index hints in an example I was using in my session for SQLSaturday 1000 (Oregon 2020).

DELETE TOP (@BatchSize) vt
FROM Warehouse.VehicleTemperatures vt
WHERE vt.RecordedWhen < DATEADD(DAY, -180, GETUTCDATE());

This was an example of a garbage collection process. The plan didn’t appear to be a problem, but we should be suspicious of the scan here:

The table scan only read 100 rows, but that’s because there is a TOP operator. The first 100 rows met our filter, so the query ended at that point. If no rows (or less than 100) matched, we would have scanned the entire table.

An index exists on the RecordedWhen column; it just wasn’t used. This is another place where a hint seems obvious. Maybe updating statistics would also resolve the issue, but this gives me more certainty.

DELETE TOP (@BatchSize) vt
FROM Warehouse.VehicleTemperatures vt WITH (INDEX(IX_VehicleTemperatures_RecordedWhen))
WHERE
	vt.RecordedWhen < DATEADD(DAY, -180, GETUTCDATE());

With Great Power

By using hints we are taking some of the responsibility away from the SQL Server, and we can cause entirely new problems. Here are some considerations before you try adding a hint.

  1. Relationships. Make sure you understand the cardinality and relationship between tables. This will inform your expectations about how many rows will be returned where.
  2. Indexes. Understand what options you have on each table in your query. A table may use one index based on the WHERE clause, or another based on the ON clause. The join order and indexes used are related. An index hint may push SQL Server to a specific join order; vice versa with join\order hints.
  3. Index hints can break your code! If you use an index hint in a procedure and later drop the index, SQL Server will not politely ignore your suggestion and move on. The procedure will fail until you remove the hint or recreate the index. So, if you use index hints, be aware of this and always check if any hints reference an index before you drop it.
  4. The most effective filter. If the logic of your statement filters across several tables, consider which one should reduce your result set the most. You probably want that table first in your execution plan.
  5. Test and test again. The new plans may be completely different from what we imagine, so we really must test our hinted queries and procedures with gusto. Test it for a variety of cases to make sure your code works on realistic data sets. In my case, I will sometimes test against large and small restored databases to make sure it performs as expected.

I’ve heard other engineers speak dismissively of hints, but I would encourage you to not discard a useful tool. Just realize you can cut yourself with it.

One of my coworkers recently resolved a performance issue by changing the join order and forcing it with a hint, or “doing a Jared Poche” in his words. Which shows you how often I’ve used hints, and how often they’ve worked.

Hopefully you learned something from this post. Please follow me on twitter (@sqljared) or contact me if you have questions.

Another Example: 

I was reviewing the performance of a procedure recently and stumbled over another pumpkin. 

My last blog post was about the Halloween Problem, and we saw its effects on an UPDATE statement. In this case, it was the same issue but with an INSERT statement. Here’s the code being executed:

INSERT INTO Schema1.Object1 (
		Column1,
		Column2,
		Column3,
		Column4 )
	SELECT
		Object2.Column1,
		Object2.Column2,
		Object2.Column3,
		Object2.Column4 
	FROM Object3 Object2
	LEFT LOOP JOIN Schema1.Object1 Object4 WITH(INDEX(Column5))
		ON Object4.Column3 = Object2.Column3
		AND Object4.Column1 = Object2.Column1
		AND Object4.Column2 = Object2.Column2
		AND Object4.Column4 = Object2.Column4
	WHERE
		Object4.Column6 IS NULL

The gist is, we’re trying to insert a record into Object1, assuming said record doesn’t already exist. We’re querying the data we want to insert from a temp table, but joining to the base table to make sure a record doesn’t already exist with the same values.

In the case of an UPDATE statement, if we update fields that are in our query’s search criteria, we could update the same row multiple times. SQL Server’s Halloween protections prevent this, but result in extra work that affect our performance.

The INSERT statement here is similar, trying to insert a record while querying to see if the same record exists. So, again SQL Server adds Halloween protections to our plan:

Plan Analysis

I would have expected us to scan the temp table, then have a LEFT JOIN to the base table. The Table Spool is the red flag that we have an issue with the plan, and is frequently seen with Halloween protections.

The index scan on the base table seems to be overkill since we’re joining on the primary key columns (the key lookup isn’t much of a concern). But we’re likely doing the scan because of the spool; it’s SQL Server’s way of getting all relevant records in one place at one time, breaking the normal flow of row mode operation, to make sure we don’t look up the same record multiple times.

Easy Fix

The data we are trying to insert is being passed into the procedure using a memory-optimized table valued parameter. We’ve queried that into the temp table as another step before our final INSERT SELECT query, because SQL Server will sometimes make poor optimizations when TVP’s are involved (because they have no statistics).

The solution then is an easy one. We move our LEFT JOIN out of the final INSERT, and we make that check as we query the TVP’s data into the temp table. We separate the SELECT against that table from the INSERT; they are now in separate operations, and the Halloween protections are no longer necessary.

If you liked this post, please follow me on twitter or contact me if you have questions.

I encountered something recently I’d never encountered, so I had to share. I was making another change to a procedure I’ve been tuning recently. The idea was to alter the UPDATE statements to skip rows unless they are making real changes. The activity here is being driven by customer activity, and that sometimes leads to them setting the same value repeatedly. Difficult to know how often we update a row to the same value, but we think it could be significant. So, we added a clause to the UPDATE so we’ll only update if ‘old_value <> new_value’. The actual update operator is the most expensive part of the statement, but Simple enough so far. The scan is against a memory optimized table variable, and the filter to the left our our seeks and scans check for a change to our value. Nothing left but to update the index and…

Curve Ball

Wait, what’s all this? We have a Split operator after our Clustered Index Update. SQL Server does sometime turn an UPDATE statement into effectively a DELETE and INSERT if the row needs to move, but this seems a bit much. We have a total of 4 index update/delete operators now, and they aren’t cheap. My very simple addition to the WHERE clause actually caused a small increase in duration, and a big jump in CPU. So what’s going on?

UPDATE Object1
SET
	Column1 = CASE 
		WHEN Variable1 = ? THEN Object2.Column1 
		WHEN Variable1 = ? THEN Object1.Column1 + Object2.Column1 
		END,
	Column4 = GETUTCDATE()
FULL OUTER JOIN Variable5 dcqt
	ON Object1.Column9 = Variable2
	AND Object1.Column10 = Variable3
	AND Object1.Column6 = CASE WHEN Variable6 = ? AND Object2.Column2 >= ? THEN ? ELSE Object2.Column2 END
LEFT JOIN Variable7 oq
	ON Object1.Column6 = Object3.Column6
WHERE
	Object1.Column10 = Variable3
	AND Object1.Column9 = Variable2
	AND Object1.Column2 >= ?
	AND Variable8 < ?
	AND Object1.Column1 <> CASE 
		WHEN Variable1 = ? THEN Object2.Column1 
		WHEN Variable1 = ? THEN Object1.Column1 + Object2.Column1 
		END

So, we’re updating Column1 to the result of a CASE statement, and the last part of our WHERE clause compares Column1 to the same CASE. And the CPU for this statement just doubled? I happened to jog this by the superlative Kevin Feasel, who suggested this was the Halloween Problem.

The Halloween Problem

The Halloween Problem is a well documented issue, and it affects other database systems, not just SQL Server. The issue was originally seen by IBM engineers using an UPDATE to set a value that was also in their WHERE clause. So, database systems include protections for the Halloween Problem where necessary in DML statements, and SQL Server decided it needed to protect this query. And our query matches the pattern for this issue; we’re filtering on a field while we are updating it. All DML statements can run afoul of this issue, and there are examples for all in this really excellent series of posts by Paul White.

An Unlikely Ally

The protection SQL Server employs ultimately comes down to interrupting the normal flow of rows from operators up the plan. We actually need a blocking operator! A blocking operator would cause all the rows coming from the query against our primary table to pool in that operator. We’ll have a list of all relevant rows in one place, and they can be passed on to operators above without continuing the index seek against our table; possibly seeing the same row a second time. Eager spools\table spools are frequently used for this purpose, and SQL Server used an eager spool to provide Halloween protection in my case. A sort would also do, and we could design this query to sort the results of querying the table. If our query already employed a blocking operation to interrupt the flow, SQL Server would not need to introduce more operations to protect against the Halloween Problem. In my case, I definitely don’t want it spooling and creating three more expensive index operations. My idea for rewriting this goes a step farther.

A Two Table Solution

If we queried the data from our base table into a temp table, we could then update the base table without querying that same table on the same column, tripping over a pumpkin in the process. My procedure is already using memory optimized table variables, because this proc runs so frequently that temp tables cause contention in tempdb (described here). So in this instance, I’ll actually query this data into another motv. I can also use the data I stash in my motv to decide if any rows I intended to update don’t exist yet. I’ll INSERT those later, only if needed.

Bonus

Now, the whole point of the original change was to reduce work when we update a field to be equal to its current contents. In my motv, I’m going to have the old and new value for the field. It would be simplicity itself to put my UPDATE in a conditional, so that we only run the UPDATE if at least one row in my motv is making an actual change to the field. So instead of just reducing the cost of our update operators, we’ll skip the entire UPDATE statement frequently, for only the cost of one SELECT and a write to our motv.

Results

Stunning. The new logic causes this proc to skip the UPDATE statements entirely over 96% of the time. Even given that we are running a query to populate the memory optimized table variable (which is taking <100 microseconds) and running an IF EXISTS query against that motv (which takes 10-20 microseconds), we’re spending 98% less time doing the new logic than the original UPDATE statement. When I started reviewing the procedure several months ago, it took 3.1 milliseconds on average. I’ve tried several other changes in isolation, some effective, some not. The procedure is now down to 320 microseconds; an almost 90% reduction overall after a 71% drop from this change. I have some other ideas for tweaks to this proc, but honestly, there’s very little left to gain with this process. Time to find a new target. If you liked this post, please follow me on twitter or contact me if you have questions.

In my last post, I spoke about optimizing a procedure that was being executed hundreds of millions of times per day, and yes, that is expected behavior.

The difficult thing about trying to optimize this procedure is that it only takes 2.5ms on average to run. Tuning this isn’t a matter of changing a scan to a seek; we’ll have to look hard to find the opportunities here. A one millisecond Improvement on a procedure running 100 million times a day would save 100,000 seconds every day.

Well, I’ve found a few more options since my last post, and wanted to share my findings.

Setup

The procedure has some complex logic but only runs a few queries.

  • There are a few simple SELECT statements to populate some variables. These take a small percentage of the overall runtime.
  • There are two UPDATE statements, and we will run one or the other. Both join a table to a table variable; one has a second CTE doing some aggregation the other lacks. The majority of our time is spent running these UPDATEs.
  • An INSERT statement that takes place every time. This is to ensure that if we didn’t update a record because it didn’t exist, we make sure we insert the row. It’s very likely on a given run we will INSERT 0 rows.

Brainstorming

Since we 80% of our time in the UPDATE (I love Query Store), that’s the place with the most potential for gain.

But, on first look (first couple) it seems difficult to see room for improvement here. We’re doing index seeks with small row counts. The index scans are against memory optimized table variables, and you may notice they are cheaper than than the index seeks.

But, looking at plan one thing did draw my attention:

There’s a table spool, and following that I see the plan is updating an indexed view. Which we would do every time there’s an UPDATE. Hundreds of millions of times a day…huh. So, removing the index on that view would eliminate this entire middle branch from the plan.

The view is based on two columns that are the first two columns of the clustered PK of the underlying table. The view does some aggregation, but the difference between querying the view or the table is reading 1 row versus maybe 2 or 3 rows, most of the time. Dropping that index seems like a good thing to try.

And I did mention this in my last blog post, but we perform the INSERT statement every execution of the procedure, and we run the trigger on this table even if we inserted 0 rows. So, if we can detect whether the INSERT is needed, we can potentially skip the majority of the executions of the statement and the trigger.

The logic for the procedure uses a TVP and a couple of table variables, which isn’t optimal. SQL Server doesn’t have statistics on table variables, so it’s not able to make good estimates of how many rows are going to be returned (unless you are using table variable deferred compilation in SQL Server 2019). We could change these to temp tables, and see if we have better results. Hopefully, we’ll have a more stable plan across the many databases running it.

Both of the UPDATE statements have a bookmark lookup. We’ll be looking up only a few rows, but this could be a significant improvement for a query that takes so little time. Also, one of the UPDATE statements references the main table an additional time in its CTE. So we have two index seeks plus the key lookup. How much of our time is spent in the second access and the bookmark lookup?

Results

first change

I’ve been working on releasing these changes individually, and the first one is complete. Removing the index on the view, resulted in a 17% reduction in the duration of the procedure (from 2.34ms to 1.94ms), and a 20% reduction in CPU. Come to think of it, not having to update that index would have helped with our INSERT statement as well.

I’ll update this post once I have details on the other changes.

If you liked this post, please follow me on twitter and contact me if you have questions.

Updates

Second change

So, the second idea I’ve tried on this issue, is replacing the table variables with temp tables. The idea was that temp tables have statistics and table variables don’t, so we should tend to have better execution plans using temp tables. However, it didn’t work out that was in this case because it also caused pagelatch contention in tempdb.

Tempdb pagelatch contention is a very well documented issue. There are things you can do to mitigate this issue, but at a certain point you just need to create fewer temp tables. I’m not sure where the line is in the case of my environment, but it was clear that creating 3 temp tables in a proc running this often crosses that line.

This change has been reverted, but I’ll update this post again shortly. I should be making the change to skip unnecessary INSERT statements this week.

Skipping Inserts

This change has been made, and the results are fairly minor. The first measure I took showed the average duration of the proc dropping from 1.95 to 1.91 ms, but this varies enough from day to day that I could cherry pick numbers and get a different, even negative improvement.

This confirms something interesting. The cost of the procedure overall was skewed very heavily to the UPDATE statements, with the INSERT being significantly less. The most expensive operator in a DML operation is typically the Insert\Update\Delete operator itself; the step where we actually change data. My expectation all along has been that there were very few rows actually being inserted by our INSERT statement. It seems this is true and since the statement doesn’t actually insert data, the Insert operator doesn’t do work or take any significant amount of time. So we gained very little by this change, at least from the procedure.

This change also prevented us from calling the trigger with empty inserts, so we saw the trigger drop from executing ~350 million times per day to 6 million. This saves us about 14,000 seconds per day. I don’t think about context switching in the context of SQL Server very often, but in addition to the time benefit, there’s a small benefit in just not having to set up and switch to the context for the trigger itself.

Overall a small victory, but I’ll take it.

Pending

I should have my final update on this topic in the next week. This will change the logic of one of the UPDATE statements to remove an unnecessary CTE which references the main table, and adds an index hint to use the primary key, removing a bookmark\key lookup from both statements. The first update hits the table twice (once for the nonclustered, another for the key lookup) and the second hits it thrice (because of the CTE), but this change will drop both to 1 access of the PK, and should only query a few rows.

I’m very interested to see how this affects performance, as this should affect the largest part of work done outside of the actual Update operator in those statements.

Finale

So my change to the UPDATE statement is out, and the results are pretty good. First, The runtime of the procedure has dropped to 1.399 ms. If you recall, it originally took about 2.5 ms, so we’ve dropped this overall by more than 1 millisecond, which would be 40% of its original runtime.

Second, I love the simplicity of the new plan. The previous anonymized plan is at the top of this post; it took 2 screenshots to cover most of it. In particular, I noticed we were hitting the central table 3 times. Once in a CTE that was doing aggregation that was unnecessary, I removed that entirely. The second reference was in the main query, and it caused a key lookup which was the third access. I hinted our query to use the clustered primary key, which leads with the same two columns that are in the nonclustered index the optimizer seems to prefer.

The index scans are both in memory optimized table variables, and you can see the estimates are lower than accessing the main table in the clustered index seek. I also love seeing that the actual update operator on our left is such a large amount of the effort; that’s what I expect to see on a DML operation that’s tuned well.

In summary

I had 4 ideas originally to tune this query. I wanted to see if we could drop it by 1 millisecond, and we gained 1.1.

  1. Removing the index from an indexed view referencing this table had a significant effect. This required a significant amount of research, as I had to find all the places where we referenced the view and determine if any would experience significantly worse performance without the aggregation from that view. I didn’t see any red flags, and this changed dropped the procedure’s duration from 2.5ms to 1.94ms (22% reduction).
  2. The procedure uses a few in memory table variables with temp tables. The idea is that temp tables have statistics. That could lead to SQL Server making better plans, because it can estimate how many rows are in a given operation. This would work in other cases, but not for a proc that runs this often. Performance was actually slowed because of significant PAGELATCH waits. Every execution we were creating multiple temp tables, and at this pace our threads were constantly waiting on key pages in tempdb. This change was reverted.
  3. Reducing the INSERTs was a gain, but a minimal one. The INSERT statement itself took very little of our time in the procedure. We also got to skip running the INSERT trigger, but that also did not take long to run. It’s possible we ended up with less waiting in the main table, or the table our trigger was updating, but if so the gains were too small to quantify.
  4. Simplifying the logic of the two UPDATE statements that were taking most of the time in the procedure was a success. We went from 3 operations on the only permanent table in our query to 1, and removed an aggregation step we didn’t need. This dropped our runtime from 1.94ms to 1.399ms (27.8%). Every operation counts.

Hopefully you’ve learned something from this post and its updates. If so, please follow me on twitter or contact me if you have questions.