Query Waits

Carrying on from my previous post on the QDS Toolbox, let’s review the Query Waits component.

This component provides details about the wait types associated with a given object, query, or plan. This doesn’t pull data from the reports generated by the Server Top Queries component; this gets the data directly from the Query Store in the database you specify. So you can use this to review the waits on a given procedure, even if you haven’t generated any reports with Server Top Queries .

That being said, reports from Server Top Queries could identify problem queries and give you the PlanID, QueryID, or ObjectName you need to run the procedure. A query that spends a lot of time waiting would tend to have a higher duration and lower active metrics like CPU usage or logical reads. So those are the queries I would want to run the Query Waits report against.

Syntax

Assuming you installed the QDS Dashboard without issue, let’s look at the options for the procedure. @sqlozano documented these in the header of the procedure, for reference. Here’s an example script:

USE QDSToolBox
GO
DECLARE 
	@StartTime DATETIME2,
	@EndTime DATETIME2;

SELECT
	@StartTime = DATEADD(MONTH,-1,GETUTCDATE()),
	@EndTime = GETUTCDATE();

EXECUTE [dbo].[QueryWaits]
	--@ServerIdentifier = 'ServerName\Instance',
	@DatabaseName	= 'WideWorldImporters',
	--@ReportIndex		= '[dbo].[QueryWaitsIndex]'
	--@ReportTable		= '[dbo].[QueryWaitsStore]'
	@StartTime = @StartTime,
	@EndTime = @EndTime,
	@ObjectName	= 'OnDisk.InsertVehicleLocation'
	--@QueryID	= 172
	--@PlanID	= 178
	--@VerboseMode = 0
	--@TestMode = 0
GO

  • ServerIdentifier: Commented out. I’m gathering data from the instance I’m logged into, which is the default.
  • DatabaseName: I have a separate database for the QDS Toolbox, so I’m using this parameter to get data from the WideWorldImporters database.
  • ReportIndex and ReportTable: You can store the data generated by this procedure, and refer to it later. I’m referencing the two tables created by the scripts included in the toolbox. In the header, Pablo included an example that pushes the data to a centralized server across a linked server. You just need to fully qualify the name, like ‘[LinkedSrv].[LinkedDB].[dbo].[CentralizedQueryWaitsIndex]’. If you skip these options, you’ll just get a result set that won’t be saved.
  • StartTime and EndTime: In my example, I set the EndTime to be current, and the StartTime to be a month ago. So you can specify any reasonable window you want.
  • ObjectName, QueryID, PlanID: Choose 1 only. The procedure will fail with an error if you provide two. Again, the Server Top Queries output will give you options you can use here.
  • Verbose Mode: Allows you to see the TSQL generated to produce the report. You could then tweak that query if you want something different in your result set.
  • TestMode: Enabling this prevents the produced script from actually running.

Output

I used the OnDisk.InsertVehicleLocation procedure originally to populate some data in my database (which is part of the World Wide Importers sample on github). Running the Query Waits report to check it’s behavior, this is what the output looks like.

The PlanID, QueryID, and QueryTextID may or may not be returned depending on the input. If you query for a specific QueryID, but it has many associated PlanIDs, this report will only give you one row per interval because it aggregates all the plans together. If they weren’t aggregated, you could have dozens or hundreds of rows per interval.

The results show the numbers from each time interval within the window specified. The duration and CPU time are included, along with the number of plans and queries executed in that interval.

Then we have a lot of columns indicating waits in various categories (I didn’t even include them all in this image). The names here correspond to the types in the page for sys.query_store_wait_stats, so you will need to refer to the table I’ve linked. The WWI workload driver inserts 120,000 rows by default (across 120 threads), and the first interval has data from running that tool.

The data in sys.query_store_wait_stats is in milliseconds, but other tables with performance data like sys.query_store_runtime_stats are in microseconds. This report multiplies the time from sys.query_store_wait_stats by 1000, so if you compare the waits to the duration they have the same units.

There are 17 milliseconds of waits were in the Idle category; this includes a number of things, and it could also be that the tool itself has some delays between batches.

There are also 21 milliseconds of memory waits. This is another category with a number of specific wait types, but both of these seems inconsequential given the procedure took 10,226 milliseconds in this interval. The vast majority of that overall duration is accounted for by the 10,114 milliseconds of CPU time.

So in this case, waits don’t appear to be the issue.

Wrapping up

If I see a query’s CPU usage is much less than half its duration, I’ll usually consider its waits. This is an important aspect of SQL Server performance, and the Query Waits report here makes this data very accessible.

Two down, six more components in the QDS Toolbox. I’ll keep working on these posts, with a few other topics in between.

I hope you find this post helpful. If you have any topics related to performance in SQL Server you would like to hear more about, please feel free to make a suggestion. You can follow me on twitter (@sqljared) and contact me if you have questions.

In having a talk reviewed recently, it was suggested I spend more time defining some of the subject I touched on. It occurred if I should go over (or at least introduce) these ideas during a talk for a SQL Saturday audience, some might find a post on the subject useful. Hence my recent post on key lookups.

Another such topic is table variables. I use table variables frequently at my current job, but they came up very infrequently when I worked at CSS in Microsoft. I remember the conversations about them being very simple at the time, as in, “you should just use temp tables instead.” But there is a lot of utility with table variables, and they could be a useful arrow in your quiver.

Statistics

Table variables are much maligned for one issue; they don’t have statistics.

In most ways, table variables function a lot like temp tables. You can define their columns and insert rows into them. Scoping is different. If you declare a table variable inside a procedure, its scope ends when you return from that procedure. If you create a temp table inside a procedure, it remains for the connection, so the procedure that called the procedure creating the temp table can see the temp table and its contents.

Both can have indexes created on them, but the table variable will have no statistics on its indexes. The result is that the SQL Server optimizer doesn’t know how many rows are in the table variable and will assume 1 row when trying to compare possible execution plans. This may lead to worse plan than you would have with a temp table containing the same data.

This is mitigated in a feature of SQL Server 2019’s Intelligent Query Processor, table variable deferred compilation. This provides cardinality estimates based on the actual number of rows in the table variable, and statements are not compiled until their first execution.

“This deferred compilation behavior is identical to the behavior of temporary tables. This change results in the use of actual cardinality instead of the original one-row guess.”

Microsoft

This feature does require you to be using compatability level 150, but this is a significant upgrade.

Alternatively, if you want to change the behavior around a table variable and aren’t using SQL Server 2019 (or have another compatability mode), you could use a query hint. I’ve used an INNER LOOP JOIN hint many times when using a table variable I expect to have a small number of rows as a filter to find specific rows in a permanent table. This gives me the join order and join type I expect to see, but I’m less shy about hints than many as I have commented previously.

Tempdb contention

There’s an ocean of posts about how to address tempdb contention, mostly focusing on creating enough data files in your tempdb or enabling trace flags that change some behaviors. The point I think that gets understated, is that creating fewer objects in tempdb would also address the issue.

One of the ways table variables function like temp tables, is that they store data in tempdb and can drive contention there as well. The cause of that contention is different, but creating table variables across many different sessions can still cause contention.

Table Valued Parameters

TVPs allow us to pass a table variable to a procedure as an input parameter. Within the procedure, they function as read-only table variables. If the read-only part is an issue, you can always define a separate table variable within the procedure, insert the data from the TVP, then manipulate it as you see fit.

The caveats here are the same; the TVP is stored in tempdb (and can cause contention there), you may have issues with cardinality due to the lack of statistics.

But the advantage is that we can make procedure calls that operate on sets of data rather than in a RBAR fashion. Imagine your application needs to insert 50 rows into a given table. You could call a procedure with parameters for each field you are inserting, and then do that 49 more times. If you do it serially, that will take some time, in part because of the round trips from your application to the SQL Server. If you do this in parallel, it will take longer than it should because of contention on the table. I haven’t blogged about the hot page issue, but that might make a good foundation topic.

Or you could make one call to a procedure with a TVP so it can insert the 50 rows without the extra contention or delay. Your call.

Memory Optimized Table Variables

This changes this by addressing one of the above concerns. Memory optimized table variables (let’s say motv for short), store data in memory and don’t contribute to an tempdb contention. The removal of tempdb contention can be a huge matter by itself, but storing data in memory means accessing a motv is very fast. Go here if you’d like to read the official documentation about memory optimized table variables.

A motv has to have at least one index, and you can choose a hashed or nonclustered index. There’s an entire page on the indexes for motv’s, and more details for both index types on this page on index architecture.

I tend to always use nonclustered indexes in large part for the simplicity. Optimal use of hash indexes requires creating the right number of buckets, depending on the number of rows expected to be inserted into the motv. Too many or too few buckets both present performance issues. It seems unlikely to configure this correctly and then not have the code change that later. It’s too micromanage-y to me, so I’d rather use nonclustered indexes here.

We do need to keep in mind that we are trading memory usage for some of these benefits, as Eric Darling points out amusingly and repeatedly here. This shouldn’t matter the majority of the time, but we should keep in mind that it isn’t free and ask questions like:

  • Does that field really need to be nvarchar(max)?
  • I expect this INSERT SELECT statement to insert 100 rows, but could it be substantially more?
  • What if there are 100 simultaneous procedures running, each with an instance of this motv? Each with 100 rows?
  • How much total memory could this nvarchar(200) require if we’re really busy?

I certainly would want to keep my columns limited, and you can make fields nullable.

Memory Optimized Table Valued Parameters

So, we can memory optimize our TVP and get the advantages of both. I use these frequently, and one of the big reasons (and this applies to TVPs in general) is concurrency.

So, let’s compare inserting 50 rows of data into a physical table using a normal procedure versus a procedure with a memory optimized table variable.

Singleton Inserts

To test this, I used a tool provided in the World Wide Importers sample on github. Specifically, I was using the workload driver. This was intended to compare the performance of in-memory, permanent tables with on-disk tables, but the setup will work just fine for my purposes.

The VehicleLocation script enables In-Memory OLTP, creates a few tables, procedures to insert into those tables, and adds half a million rows to them. I’ll be using the OnDisk.VehicleLocations table and the OnDisk.InsertVehicleLocation to insert into it.

I wrote the following wrapper procedure to provide random inputs for that procedure:

USE WideWorldImporters
GO

CREATE OR ALTER PROCEDURE OnDisk.InsertVehicleLocationWrapper
AS
BEGIN
	-- Inserting 1 record into VehicleLocation
	DECLARE @RegistrationNumber nvarchar(20);
	DECLARE @TrackedWhen datetime2(2);
	DECLARE @Longitude decimal(18,4);
	DECLARE @Latitude decimal(18,4);

	SET NOCOUNT ON;

	SET @RegistrationNumber = N'EA' + RIGHT(N'00' + CAST(CAST(RAND() AS INT) % 100 AS nvarchar(10)), 3) + N'-GL';
	SET @TrackedWhen = SYSDATETIME();
	SET @Longitude = RAND() * 100;
	SET @Latitude = RAND() * 100;

	EXEC OnDisk.InsertVehicleLocation @RegistrationNumber, @TrackedWhen, @Longitude, @Latitude;
END;
GO

With this, I basically wrote a loop to call this to individually insert 500 rows. Query Store showed the procedure ran its one statement in 23 µs. I love measuring things in microseconds.

Parallel Inserts

Except that didn’t really test what I wanted. Running it in this fashion in SSMS, it would have run the procedure 500 times serially. I mentioned concurrency as an advantage for memory optimized TVPs, but to see that advantage we need to do singleton inserts into our table in parallel.

A little Powershell later, and I ran this again. I did only 50 inserts, but I did them using 50 parallel threads. Each thread will call the wrapper proc to generate random inserts, then call OnDisk.InsertVehicleLocation.

Since this table has clustered index based on an identity column, we will always be inserting into the last page on the table. But that will require an X lock to write the page, which means our parallel calls will be blocking each other. And the more there are, the worst the waiting will get. This is the hot page issue in a nutshell.

On average each execution took 59 µs. Nearly three times as long. Likely, the first thread completed just as fast as previous, but each subsequent thread has to get in line to insert into the OnDisk.VehicleLocation table. And the average duration increases steadily throughout.

Inserting with a Memory Optimized TVP

Here’s the script I wrote to test using a memory optimized TVP to insert 500 rows.

USE WideWorldImporters
GO

IF NOT EXISTS(
	SELECT 1 
	FROM sys.types st
	WHERE 
		st.name = 'VehicleLocationTVP'
)
BEGIN
	CREATE TYPE OnDisk.VehicleLocationTVP
	AS TABLE(
		RegistrationNumber nvarchar(20),
		TrackedWhen datetime2(2),
		Longitude decimal(18,4),
		Latitude decimal(18,4)
		INDEX IX_OrderQtyTVP NONCLUSTERED (RegistrationNumber)
	)WITH(MEMORY_OPTIMIZED = ON); 
END;
GO

CREATE OR ALTER PROCEDURE OnDisk.InsertVehicleLocationBatched
	@VehicleLocationTVP OnDisk.VehicleLocationTVP READONLY
AS
BEGIN
	SET NOCOUNT ON;
	SET XACT_ABORT ON;

	INSERT OnDisk.VehicleLocations
		(RegistrationNumber, TrackedWhen, Longitude, Latitude)
	SELECT
		tvp.RegistrationNumber, 
		tvp.TrackedWhen, 
		tvp.Longitude, 
		tvp.Latitude
	FROM @VehicleLocationTVP tvp;

	RETURN @@RowCount;
END;
GO

-- So let's insert 500 rows into this on-disk table using a TVP
DECLARE 
	@Target INT = 500,
	@VehicleLocationTVP OnDisk.VehicleLocationTVP;

DECLARE @Counter int = 0;
SET NOCOUNT ON;

WHILE @Counter < @Target
BEGIN
	-- Generating one random row at a time
	INSERT @VehicleLocationTVP
		(RegistrationNumber, TrackedWhen, Longitude, Latitude)
	SELECT
		N'EA' + RIGHT(N'00' + CAST(@Counter % 100 AS nvarchar(10)), 3) + N'-GL',
		SYSDATETIME(),
		RAND() * 100,
		RAND() * 100;

	SET @Counter += 1;
END;
	
EXEC OnDisk.InsertVehicleLocationBatched @VehicleLocationTVP;
GO

This creates the type for our memory optimized table variable and defines the procedure. The last batch generates 500 rows of random data, inserting them into the same type, and calls the new proc with the TVP as input. Executing this once to insert 500 rows took 1981 µs, which is just under 4 µs per row.

Full results for all three tests

Wrapping up

This might have been my wordiest blog, but I hope you learned something from it. I’ve rewritten a number of procedures in recent years to operate on batches, and the results have largely been great.

Again, I will frequently use a join or index hint when joining the TVP to a base table, and that’s a minor change that can prevent an expensive mistake from the lack of statistics.

If you have any topics related to performance in SQL Server you would like to hear more about, please feel free to @ me and make a suggestion.

Please follow me on twitter (@sqljared) or contact me if you have questions.

USE WideWorldImporters
GO
SELECT TOP 100 
	sol.OrderID, 
	sol.UnitPrice,
	sol.Quantity,
	sol.PickedQuantity,
	sol.LastEditedWhen
FROM Sales.OrderLines sol 
WHERE 
	sol.StockItemID = 20
GO

Key Lookups

In working on my presentation for Data Saturday #8 – Southwest US, I hadn’t realized how many topics come up at least briefly in the talk. I wanted to make a few posts about to go into details on each of these topics and why they are important.

My thanks again to Deborah Melkin for her review and feedback of the presentation.

A key lookup is an operation that occurs when a query has used a nonclustered index on a given table, but needs to access more columns to complete the query. It may need to check columns not in that index for additional filters, or it may just need to return that column as part of its result set.

In the simple query above, we’re retrieving 100 rows from the seek against a nonclustered index, then performing a key lookup against the clustered index. There is a nested loops operator between the two and understanding how that operates is important; for each row we receive from the first table, we perform the second operation once. So, in this query we are seeking 100 rows from the nonclustered index, then performing the key lookup 100 times. We go through the index once for each row we return, and you can see the cost of the key lookup operator is 99% of the query.

Operator Details

Details for the Key Lookup operator

Mouseover

If we mouseover the key lookup, we can see the details of this operation. We actually read 100 rows . The “Estimated Operator Cost” (0.324977) is nearly 100 times that of the index seek (0.0035899).

The “Number of Executions” is 100, so for each row received from the index seek, we traverse the clustered index (its index and leaf pages) once to get that row. And we do 100 separate seeks of that index to get 100 rows. This is a lot more work than we did to get 100 rows with 1 index seek from the nonclustered index.

The estimates match our actuals, but the TOP clause is a very good hint for how many rows we should receive.

If you have a table scan somewhere in your plan is table scanning millions of rows, you should probably address that first. But removing the key lookup by returning fewer columns drops this query from 12.5 milliseconds to 73 microseconds. That’s a 94.16% duration reduction (thank you Query Store).

Resolution

There’s two ways to handle a query like this with a key lookup.

  1. Do we need these columns in our query?
  2. Create a covering index.

Addition by Subtraction

We are doing the key lookup because we want to return columns, or filter\otherwise use columns, that are not in the nonclustered index. Let’s first ask this: do we need these columns in our query?

If we check the code or application that’s retrieving the results, does it actually consume those columns from the result set and use them? If we are filtering on that column, does that filter still make sense? If not, let’s just take it out of the query to simplify matters.

And it is very clear which columns are the issue. If you look at the details of the key lookup in the image above, the Output List for that operator shows which columns we are using the clustered index to retrieve. If you don’t need any of them, you can remove them from the query. Your new execution plan will be missing a key lookup.

CREATE COVERING INDEX

The heading is a joke; there’s no such command, of course. A covering index is a nonclustered index that supplies all the information you need from a given table to complete a given query. So far, we’re doing key lookups for this query because no such an index exists. We could get all these columns from the clustered index, but we would have to scan the whole index because our WHERE clause doesn’t match the sorting of the clustered index.

Normally when we create an index, we want our index to include any columns we are filtering on. So it would include columns in our WHERE clause, or the columns in our JOIN clause if we are joining from another table. In some cases, you might want the index to match an ORDER BY. Here just the section in red.

For a covering index on this query, we need to include the SELECT list (in the green section) in our index. In general, every column for this table referenced in the query needs to be in our index.

The INCLUDE column is a great way to add in the columns in our SELECT list.

We could add those 5 columns to our index normally as key values, but that would unnecessarily bloat all the pages of the index. We aren’t filtering on any of those columns, so we don’t need the columns in the index pages for us to filter properly. If we use the INCLUDE clause, these columns will be present only in the leaf page of our index. This is similar to how the columns from the clustered index are added to all nonclustered indexes.

So a script for the new index would look like this:

CREATE NONCLUSTERED INDEX [IX_Sales_OrderLines_AllocatedStockItems] ON [Sales].[OrderLines](	
	[StockItemID] ASC
) INCLUDE(
	[PickedQuantity],
	[OrderID],
	[UnitPrice],
	[Quantity],
	[LastEditedWhen]	
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = ON, ONLINE = ON, 
	ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [USERDATA]
GO

With the index in place, our original query took 95 microseconds. Slightly longer than the query with the reduced result set, but we did increase the size of the index some.

Conclusion

A key lookup might be an operation you don’t notice often, but I’ve been impressed with the result of removing them when I can.

I’ll be posting other blogs with foundational topics in the near future and more posts in general than I’ve had recently. Maybe this isn’t foundational; it might be on the first floor.

I hope you’ve learned something from this post. Please follow me on twitter (@sqljared) or contact me if you have questions.

So let’s talk about the best metric no one thinks about.

A Good Question

I was working with a client troubleshooting an issue several years back now, and they asked a question that was difficult to answer.

We were troubleshooting a slow query, and the reason it was taking so long was oddly opaque.

  • It wasn’t blocked
  • It wasn’t waiting on CPU or driving the CPU on the server
  • It wasn’t waiting on memory
  • It wasn’t waiting on the disks

So, the usual suspects were out. We didn’t have the plan for this query yet, and were operating without much information yet. So, the client asked the simple question, “If it isn’t waiting on anything, why isn’t it done?”

I thought it a simple question at first, then I really thought about it. With the usual suspects out, we really couldn’t see what SQL Server was doing. What could be taking up that time?

Invisible culprit

It finally occurred to me that just because we weren’t seeing the disks spike didn’t mean we weren’t doing a lot of reads. Logical reads don’t involve the disks; we’re only reading pages that are already in memory. They are faster than physical reads for sure, but that doesn’t mean they are instant.

The logical reads themselves won’t cause waits. They will use some CPU cycles, but in this case we saw no CPU related wait type or a high overall CPU %.

Take the plan above. It took 84 seconds to complete, and only returned 16 rows. But it read millions of rows at the lower levels. If the tables involved are largely in memory, we won’t be waiting on physical IO. It’s a SELECT statement, so it won’t be blocked or blocking if you are using read-committed snapshot isolation level (RCSI, I should blog about that).

With no waits or blocking, how would you see this query is less than optimal unless you are looking for it?

Detecting logical reads

We eventually got the execution plan and it confirmed my suspicion. The plan was reading many, many rows and joining across a number of tables. Once we had the plan, were able to come up with a plan to filter it down faster.

But the lesson here is the effects of logical reads are hard to detect. This makes logical reads a good metric to check for if you want to look at your activity within SQL Server. Here’s an example script for returning the queries with the highest logical reads in the execution cache, centering around sys.dm_exec_query_stats :

SELECT TOP 50 
	qs.creation_time, 
	qs.execution_count, 
	qs.total_logical_reads, 
	qs.total_logical_reads/qs.execution_count AS ave_logical_reads, 
	qs.total_elapsed_time, 
	qs.max_elapsed_time, 
	qs.total_elapsed_time/qs.execution_count as ave_duration, 
	substring(t.text, qs.statement_start_offset/2+1, 
		(CASE WHEN qs.statement_end_offset=-1 THEN (len(t.text)- qs.statement_start_offset)/2
		ELSE (qs.statement_end_offset- qs.statement_start_offset)/2 END )+1) AS statement_text,
	qp.query_plan,
	t.[text], 
	db_name(t.dbid) as db_name, 
	OBJECT_NAME( t.objectid, t.dbid) AS object_name, 
	qs.total_worker_time AS total_cpu_time, 
	qs.total_worker_time/qs.execution_count AS ave_cpu_time, 
	qs.total_physical_reads, 
	qs.total_physical_reads/qs.execution_count AS ave_physical_reads
	--,qs.plan_handle, qs.plan_generation_num
FROM sys.dm_exec_query_stats qs 
OUTER APPLY sys.dm_exec_sql_text(plan_handle) AS t
OUTER APPLY sys.dm_exec_query_plan(plan_handle) AS qp
ORDER BY qs.total_logical_reads DESC;
GO

This will let you look through the query stats for anything currently in the cache, but of course a restart of SQL Server or another action that clears the cache will mean there isn’t much to see.

If you use Query Store, you could see the queries with the most logical reads in the last two hours with the following:

SELECT TOP 10 
	SUM(rs.avg_logical_io_reads) AS sum_logical_io_reads,
	q.query_id,
	p.plan_id, 
	qt.query_sql_text, 
	p.query_plan,
	rsi.start_time
FROM sys.query_store_query_text AS qt
JOIN sys.query_store_query AS q
    ON qt.query_text_id = q.query_text_id
JOIN sys.query_store_plan AS p
    ON q.query_id = p.query_id
JOIN sys.query_store_runtime_stats AS rs
    ON p.plan_id = rs.plan_id
JOIN sys.query_store_runtime_stats_interval rsi
	ON rsi.runtime_stats_interval_id = 
		rs.runtime_stats_interval_id
WHERE 
	rsi.start_time > DATEADD(HOUR,-2,GETUTCDATE())
GROUP BY 
	q.query_id,
	p.plan_id, 
	qt.query_sql_text, 
	p.query_plan,
    rsi.start_time
ORDER BY SUM(rs.avg_logical_io_reads) DESC

Conclusion

Another point to consider, while physical reads take more time, the difference between the two is somewhat random. Whether given pages are in memory or not will affect how long a query takes, but we can’t really control what’s in memory without taking unusual measures.

Logical reads will always occur, and should be a more consistent measure of how much work a query requires.

Based on the plan, we may also end up reading to the same page in memory many times, for example if we have a cross product in our query.

I hope you find this post helpful.

If you have any topics related to performance in SQL Server you would like to hear more about, please feel free to @ me and make a suggestion.

Please follow me on twitter (@sqljared) or contact me if you have questions.

Happy Thanksgiving.

Hints in SQL Server

I used to be really suspicious of using hints in SQL Server, and now I can’t imagine working without them.

My opinion on this topic changed over the last few years due to a number of the performance issues I’ve worked on. I spoke at SQLSaturday 1000 (Oregon 2020) last weekend, and my talk was primarily about things I learned optimizing garbage collection and similar incremental processes. During that work I ran into a number of issues with queries like this example from the WideWorldImporters database:

	DELETE inv
	FROM @OrdersGC gc
	JOIN Sales.Invoices inv
		ON inv.OrderID = gc.OrderID;

Order Matters

The logic here is simple enough. Earlier in the process, we found orders we wanted to delete per retention policy, and put the OrderID values in a memory optimized table variable. We then use the motv to delete from all related tables, and finally the Orders table.

This query doesn’t have a WHERE clause. It’s plain to see how we want this to function though. We have 100 rows in our motv, and we want to delete the related rows in Invoices. But I’ve seen issues caused by execution plans that flip the order:

Table variables have no statistics, so the optimizer doesn’t know how many rows to expect from that operation (though table variable deferred compilation in SQL Server 2019 can resolve this) . Occasionally, I would see a plan with a join order that is the opposite of my expectation. The lack of a WHERE clause hurts here, but there’s no clause I can apply that will filter better than the items I already have in my table variable.

Consistency

I work on hundreds of databases with the same schema. They have different data sets and distributions, different sizes, and their statistics are going to update at different times. But if one of them chooses a bad plan, I have to push aside whatever other work to research the high CPU on database xyz.

Consistency is really valuable to me. And in this case, the answer is simple. Yes, I want to scan the fast, small memory-optimized table variable first, and use it to filter the larger, slower table. Adding a join hint or a force order to this query should keeps its plan and performance consistent.

	DELETE inv
	FROM @OrderList gc
	INNER LOOP JOIN Sales.Invoices inv
		ON inv.OrderID = gc.OrderID;

	DELETE inv
	FROM @OrderList gc
	JOIN Sales.Invoices inv
		ON inv.OrderID = gc.OrderID
	WITH OPTION(FORCE ORDER);

Both approaches force the join order. The INNER LOOP JOIN hint has the added benefit of ensuring the plan uses a nested loops join. A hash match wouldn’t be efficient with a batch size of a few hundred or a few thousand. A merge join would likely require a sort of one of the inputs, which defeats the purpose.

Index hints

I had to use index hints in an example I was using in my session for SQLSaturday 1000 (Oregon 2020).

DELETE TOP (@BatchSize) vt
FROM Warehouse.VehicleTemperatures vt
WHERE vt.RecordedWhen < DATEADD(DAY, -180, GETUTCDATE());

This was an example of a garbage collection process. The plan didn’t appear to be a problem, but we should be suspicious of the scan here:

The table scan only read 100 rows, but that’s because there is a TOP operator. The first 100 rows met our filter, so the query ended at that point. If no rows (or less than 100) matched, we would have scanned the entire table.

An index exists on the RecordedWhen column; it just wasn’t used. This is another place where a hint seems obvious. Maybe updating statistics would also resolve the issue, but this gives me more certainty.

DELETE TOP (@BatchSize) vt
FROM Warehouse.VehicleTemperatures vt WITH (INDEX(IX_VehicleTemperatures_RecordedWhen))
WHERE
	vt.RecordedWhen < DATEADD(DAY, -180, GETUTCDATE());

With Great Power

By using hints we are taking some of the responsibility away from the SQL Server, and we can cause entirely new problems. Here are some considerations before you try adding a hint.

  1. Relationships. Make sure you understand the cardinality and relationship between tables. This will inform your expectations about how many rows will be returned where.
  2. Indexes. Understand what options you have on each table in your query. A table may use one index based on the WHERE clause, or another based on the ON clause. The join order and indexes used are related. An index hint may push SQL Server to a specific join order; vice versa with join\order hints.
  3. Index hints can break your code! If you use an index hint in a procedure and later drop the index, SQL Server will not politely ignore your suggestion and move on. The procedure will fail until you remove the hint or recreate the index. So, if you use index hints, be aware of this and always check if any hints reference an index before you drop it.
  4. The most effective filter. If the logic of your statement filters across several tables, consider which one should reduce your result set the most. You probably want that table first in your execution plan.
  5. Test and test again. The new plans may be completely different from what we imagine, so we really must test our hinted queries and procedures with gusto. Test it for a variety of cases to make sure your code works on realistic data sets. In my case, I will sometimes test against large and small restored databases to make sure it performs as expected.

I’ve heard other engineers speak dismissively of hints, but I would encourage you to not discard a useful tool. Just realize you can cut yourself with it.

One of my coworkers recently resolved a performance issue by changing the join order and forcing it with a hint, or “doing a Jared Poche” in his words. Which shows you how often I’ve used hints, and how often they’ve worked.

Hopefully you learned something from this post. Please follow me on twitter (@sqljared) or contact me if you have questions.

Another Example: 

I was reviewing the performance of a procedure recently and stumbled over another pumpkin. 

My last blog post was about the Halloween Problem, and we saw its effects on an UPDATE statement. In this case, it was the same issue but with an INSERT statement. Here’s the code being executed:

INSERT INTO Schema1.Object1 (
		Column1,
		Column2,
		Column3,
		Column4 )
	SELECT
		Object2.Column1,
		Object2.Column2,
		Object2.Column3,
		Object2.Column4 
	FROM Object3 Object2
	LEFT LOOP JOIN Schema1.Object1 Object4 WITH(INDEX(Column5))
		ON Object4.Column3 = Object2.Column3
		AND Object4.Column1 = Object2.Column1
		AND Object4.Column2 = Object2.Column2
		AND Object4.Column4 = Object2.Column4
	WHERE
		Object4.Column6 IS NULL

The gist is, we’re trying to insert a record into Object1, assuming said record doesn’t already exist. We’re querying the data we want to insert from a temp table, but joining to the base table to make sure a record doesn’t already exist with the same values.

In the case of an UPDATE statement, if we update fields that are in our query’s search criteria, we could update the same row multiple times. SQL Server’s Halloween protections prevent this, but result in extra work that affect our performance.

The INSERT statement here is similar, trying to insert a record while querying to see if the same record exists. So, again SQL Server adds Halloween protections to our plan:

Plan Analysis

I would have expected us to scan the temp table, then have a LEFT JOIN to the base table. The Table Spool is the red flag that we have an issue with the plan, and is frequently seen with Halloween protections.

The index scan on the base table seems to be overkill since we’re joining on the primary key columns (the key lookup isn’t much of a concern). But we’re likely doing the scan because of the spool; it’s SQL Server’s way of getting all relevant records in one place at one time, breaking the normal flow of row mode operation, to make sure we don’t look up the same record multiple times.

Easy Fix

The data we are trying to insert is being passed into the procedure using a memory-optimized table valued parameter. We’ve queried that into the temp table as another step before our final INSERT SELECT query, because SQL Server will sometimes make poor optimizations when TVP’s are involved (because they have no statistics).

The solution then is an easy one. We move our LEFT JOIN out of the final INSERT, and we make that check as we query the TVP’s data into the temp table. We separate the SELECT against that table from the INSERT; they are now in separate operations, and the Halloween protections are no longer necessary.

If you liked this post, please follow me on twitter or contact me if you have questions.

I encountered something recently I’d never encountered, so I had to share. I was making another change to a procedure I’ve been tuning recently. The idea was to alter the UPDATE statements to skip rows unless they are making real changes. The activity here is being driven by customer activity, and that sometimes leads to them setting the same value repeatedly. Difficult to know how often we update a row to the same value, but we think it could be significant. So, we added a clause to the UPDATE so we’ll only update if ‘old_value <> new_value’. The actual update operator is the most expensive part of the statement, but Simple enough so far. The scan is against a memory optimized table variable, and the filter to the left our our seeks and scans check for a change to our value. Nothing left but to update the index and…

Curve Ball

Wait, what’s all this? We have a Split operator after our Clustered Index Update. SQL Server does sometime turn an UPDATE statement into effectively a DELETE and INSERT if the row needs to move, but this seems a bit much. We have a total of 4 index update/delete operators now, and they aren’t cheap. My very simple addition to the WHERE clause actually caused a small increase in duration, and a big jump in CPU. So what’s going on?

UPDATE Object1
SET
	Column1 = CASE 
		WHEN Variable1 = ? THEN Object2.Column1 
		WHEN Variable1 = ? THEN Object1.Column1 + Object2.Column1 
		END,
	Column4 = GETUTCDATE()
FULL OUTER JOIN Variable5 dcqt
	ON Object1.Column9 = Variable2
	AND Object1.Column10 = Variable3
	AND Object1.Column6 = CASE WHEN Variable6 = ? AND Object2.Column2 >= ? THEN ? ELSE Object2.Column2 END
LEFT JOIN Variable7 oq
	ON Object1.Column6 = Object3.Column6
WHERE
	Object1.Column10 = Variable3
	AND Object1.Column9 = Variable2
	AND Object1.Column2 >= ?
	AND Variable8 < ?
	AND Object1.Column1 <> CASE 
		WHEN Variable1 = ? THEN Object2.Column1 
		WHEN Variable1 = ? THEN Object1.Column1 + Object2.Column1 
		END

So, we’re updating Column1 to the result of a CASE statement, and the last part of our WHERE clause compares Column1 to the same CASE. And the CPU for this statement just doubled? I happened to jog this by the superlative Kevin Feasel, who suggested this was the Halloween Problem.

The Halloween Problem

The Halloween Problem is a well documented issue, and it affects other database systems, not just SQL Server. The issue was originally seen by IBM engineers using an UPDATE to set a value that was also in their WHERE clause. So, database systems include protections for the Halloween Problem where necessary in DML statements, and SQL Server decided it needed to protect this query. And our query matches the pattern for this issue; we’re filtering on a field while we are updating it. All DML statements can run afoul of this issue, and there are examples for all in this really excellent series of posts by Paul White.

An Unlikely Ally

The protection SQL Server employs ultimately comes down to interrupting the normal flow of rows from operators up the plan. We actually need a blocking operator! A blocking operator would cause all the rows coming from the query against our primary table to pool in that operator. We’ll have a list of all relevant rows in one place, and they can be passed on to operators above without continuing the index seek against our table; possibly seeing the same row a second time. Eager spools\table spools are frequently used for this purpose, and SQL Server used an eager spool to provide Halloween protection in my case. A sort would also do, and we could design this query to sort the results of querying the table. If our query already employed a blocking operation to interrupt the flow, SQL Server would not need to introduce more operations to protect against the Halloween Problem. In my case, I definitely don’t want it spooling and creating three more expensive index operations. My idea for rewriting this goes a step farther.

A Two Table Solution

If we queried the data from our base table into a temp table, we could then update the base table without querying that same table on the same column, tripping over a pumpkin in the process. My procedure is already using memory optimized table variables, because this proc runs so frequently that temp tables cause contention in tempdb (described here). So in this instance, I’ll actually query this data into another motv. I can also use the data I stash in my motv to decide if any rows I intended to update don’t exist yet. I’ll INSERT those later, only if needed.

Bonus

Now, the whole point of the original change was to reduce work when we update a field to be equal to its current contents. In my motv, I’m going to have the old and new value for the field. It would be simplicity itself to put my UPDATE in a conditional, so that we only run the UPDATE if at least one row in my motv is making an actual change to the field. So instead of just reducing the cost of our update operators, we’ll skip the entire UPDATE statement frequently, for only the cost of one SELECT and a write to our motv.

Results

Stunning. The new logic causes this proc to skip the UPDATE statements entirely over 96% of the time. Even given that we are running a query to populate the memory optimized table variable (which is taking <100 microseconds) and running an IF EXISTS query against that motv (which takes 10-20 microseconds), we’re spending 98% less time doing the new logic than the original UPDATE statement. When I started reviewing the procedure several months ago, it took 3.1 milliseconds on average. I’ve tried several other changes in isolation, some effective, some not. The procedure is now down to 320 microseconds; an almost 90% reduction overall after a 71% drop from this change. I have some other ideas for tweaks to this proc, but honestly, there’s very little left to gain with this process. Time to find a new target. If you liked this post, please follow me on twitter or contact me if you have questions.

In my last post, I spoke about optimizing a procedure that was being executed hundreds of millions of times per day, and yes, that is expected behavior.

The difficult thing about trying to optimize this procedure is that it only takes 2.5ms on average to run. Tuning this isn’t a matter of changing a scan to a seek; we’ll have to look hard to find the opportunities here. A one millisecond Improvement on a procedure running 100 million times a day would save 100,000 seconds every day.

Well, I’ve found a few more options since my last post, and wanted to share my findings.

Setup

The procedure has some complex logic but only runs a few queries.

  • There are a few simple SELECT statements to populate some variables. These take a small percentage of the overall runtime.
  • There are two UPDATE statements, and we will run one or the other. Both join a table to a table variable; one has a second CTE doing some aggregation the other lacks. The majority of our time is spent running these UPDATEs.
  • An INSERT statement that takes place every time. This is to ensure that if we didn’t update a record because it didn’t exist, we make sure we insert the row. It’s very likely on a given run we will INSERT 0 rows.

Brainstorming

Since we 80% of our time in the UPDATE (I love Query Store), that’s the place with the most potential for gain.

But, on first look (first couple) it seems difficult to see room for improvement here. We’re doing index seeks with small row counts. The index scans are against memory optimized table variables, and you may notice they are cheaper than than the index seeks.

But, looking at plan one thing did draw my attention:

There’s a table spool, and following that I see the plan is updating an indexed view. Which we would do every time there’s an UPDATE. Hundreds of millions of times a day…huh. So, removing the index on that view would eliminate this entire middle branch from the plan.

The view is based on two columns that are the first two columns of the clustered PK of the underlying table. The view does some aggregation, but the difference between querying the view or the table is reading 1 row versus maybe 2 or 3 rows, most of the time. Dropping that index seems like a good thing to try.

And I did mention this in my last blog post, but we perform the INSERT statement every execution of the procedure, and we run the trigger on this table even if we inserted 0 rows. So, if we can detect whether the INSERT is needed, we can potentially skip the majority of the executions of the statement and the trigger.

The logic for the procedure uses a TVP and a couple of table variables, which isn’t optimal. SQL Server doesn’t have statistics on table variables, so it’s not able to make good estimates of how many rows are going to be returned (unless you are using table variable deferred compilation in SQL Server 2019). We could change these to temp tables, and see if we have better results. Hopefully, we’ll have a more stable plan across the many databases running it.

Both of the UPDATE statements have a bookmark lookup. We’ll be looking up only a few rows, but this could be a significant improvement for a query that takes so little time. Also, one of the UPDATE statements references the main table an additional time in its CTE. So we have two index seeks plus the key lookup. How much of our time is spent in the second access and the bookmark lookup?

Results

first change

I’ve been working on releasing these changes individually, and the first one is complete. Removing the index on the view, resulted in a 17% reduction in the duration of the procedure (from 2.34ms to 1.94ms), and a 20% reduction in CPU. Come to think of it, not having to update that index would have helped with our INSERT statement as well.

I’ll update this post once I have details on the other changes.

If you liked this post, please follow me on twitter and contact me if you have questions.

Updates

Second change

So, the second idea I’ve tried on this issue, is replacing the table variables with temp tables. The idea was that temp tables have statistics and table variables don’t, so we should tend to have better execution plans using temp tables. However, it didn’t work out that was in this case because it also caused pagelatch contention in tempdb.

Tempdb pagelatch contention is a very well documented issue. There are things you can do to mitigate this issue, but at a certain point you just need to create fewer temp tables. I’m not sure where the line is in the case of my environment, but it was clear that creating 3 temp tables in a proc running this often crosses that line.

This change has been reverted, but I’ll update this post again shortly. I should be making the change to skip unnecessary INSERT statements this week.

Skipping Inserts

This change has been made, and the results are fairly minor. The first measure I took showed the average duration of the proc dropping from 1.95 to 1.91 ms, but this varies enough from day to day that I could cherry pick numbers and get a different, even negative improvement.

This confirms something interesting. The cost of the procedure overall was skewed very heavily to the UPDATE statements, with the INSERT being significantly less. The most expensive operator in a DML operation is typically the Insert\Update\Delete operator itself; the step where we actually change data. My expectation all along has been that there were very few rows actually being inserted by our INSERT statement. It seems this is true and since the statement doesn’t actually insert data, the Insert operator doesn’t do work or take any significant amount of time. So we gained very little by this change, at least from the procedure.

This change also prevented us from calling the trigger with empty inserts, so we saw the trigger drop from executing ~350 million times per day to 6 million. This saves us about 14,000 seconds per day. I don’t think about context switching in the context of SQL Server very often, but in addition to the time benefit, there’s a small benefit in just not having to set up and switch to the context for the trigger itself.

Overall a small victory, but I’ll take it.

Pending

I should have my final update on this topic in the next week. This will change the logic of one of the UPDATE statements to remove an unnecessary CTE which references the main table, and adds an index hint to use the primary key, removing a bookmark\key lookup from both statements. The first update hits the table twice (once for the nonclustered, another for the key lookup) and the second hits it thrice (because of the CTE), but this change will drop both to 1 access of the PK, and should only query a few rows.

I’m very interested to see how this affects performance, as this should affect the largest part of work done outside of the actual Update operator in those statements.

Finale

So my change to the UPDATE statement is out, and the results are pretty good. First, The runtime of the procedure has dropped to 1.399 ms. If you recall, it originally took about 2.5 ms, so we’ve dropped this overall by more than 1 millisecond, which would be 40% of its original runtime.

Second, I love the simplicity of the new plan. The previous anonymized plan is at the top of this post; it took 2 screenshots to cover most of it. In particular, I noticed we were hitting the central table 3 times. Once in a CTE that was doing aggregation that was unnecessary, I removed that entirely. The second reference was in the main query, and it caused a key lookup which was the third access. I hinted our query to use the clustered primary key, which leads with the same two columns that are in the nonclustered index the optimizer seems to prefer.

The index scans are both in memory optimized table variables, and you can see the estimates are lower than accessing the main table in the clustered index seek. I also love seeing that the actual update operator on our left is such a large amount of the effort; that’s what I expect to see on a DML operation that’s tuned well.

In summary

I had 4 ideas originally to tune this query. I wanted to see if we could drop it by 1 millisecond, and we gained 1.1.

  1. Removing the index from an indexed view referencing this table had a significant effect. This required a significant amount of research, as I had to find all the places where we referenced the view and determine if any would experience significantly worse performance without the aggregation from that view. I didn’t see any red flags, and this changed dropped the procedure’s duration from 2.5ms to 1.94ms (22% reduction).
  2. The procedure uses a few in memory table variables with temp tables. The idea is that temp tables have statistics. That could lead to SQL Server making better plans, because it can estimate how many rows are in a given operation. This would work in other cases, but not for a proc that runs this often. Performance was actually slowed because of significant PAGELATCH waits. Every execution we were creating multiple temp tables, and at this pace our threads were constantly waiting on key pages in tempdb. This change was reverted.
  3. Reducing the INSERTs was a gain, but a minimal one. The INSERT statement itself took very little of our time in the procedure. We also got to skip running the INSERT trigger, but that also did not take long to run. It’s possible we ended up with less waiting in the main table, or the table our trigger was updating, but if so the gains were too small to quantify.
  4. Simplifying the logic of the two UPDATE statements that were taking most of the time in the procedure was a success. We went from 3 operations on the only permanent table in our query to 1, and removed an aggregation step we didn’t need. This dropped our runtime from 1.94ms to 1.399ms (27.8%). Every operation counts.

Hopefully you’ve learned something from this post and its updates. If so, please follow me on twitter or contact me if you have questions.


Reducing Trigger Executions

I’ve never been a fan of triggers. I don’t like the idea of them adding an additional tax on every operation. It’s easy to forget they are even there, consuming your cycles. I’ve even seen a few nasty death-by-a-thousand-cuts scenarios with them. But I found something out this week that makes me like them even less.

I was tuning a procedure that runs 284 million times a day.

Over a number of servers and databases, but yes, that number is correct. It takes 2.5ms to run on average, with 1.0ms of CPU time. I’ll spare you the math, but that means over 3 cores of SQL Server are doing nothing but running this procedure 24/7/365. Anything we can do to improve this will be significant, even if we just shave off half a millisecond.

Small Improvements

The procedure is attempting to update or insert a few records based on a TVP input. Sometimes we run an UPDATE statement, but update the column to its current value based on one input parameter. Based on another parameter, we may filter out all rows and update nothing. But in both cases, even though we haven’t changed any data we still took the time to run the UPDATE statement. I’m unsure how often those two parameters match one of those two checks, but if we check them first we can entirely skip the UPDATE.

Like I said, every little bit helps.

After the UPDATE, there is an INSERT to make sure if we didn’t UPDATE the row because it doesn’t exist, we INSERT it. There are conditionals around the UPDATE, but not the INSERT. So, we run it every time we run the proc. 284 million times a day. Even if there’s nothing to insert.

Trigger Happy

The revelation is this:

“These triggers fire when any valid event fires, whether table rows are affected or not.”

Ouch. And I really wasn’t aware of this. So, we pay for the INSERT execution every time, even if we don’t need to insert anything. And the INSERT trigger on that table fires as well, trying to update another table even though the INSERTED and DELETED tables are empty. Charming.

But if our TVP had x items to update\insert and our UPDATE statement handled y of them, we only need to try the INSERT if y<x.

We won’t even need to query the underlying table or the TVP; just check the row counts.

We’re likely to update these records many times, and we’ll obviously only insert them once. Checking the execution counts for the INSERT and UPDATE triggers and subtracting the number of executions from this proc, it seems we UPDATE the table several hundred times for each INSERT. So, we should be able to reduce trigger executions by a huge amount.

Well then. I’ll look forward to seeing those executions drop shortly.

If you liked this post, please follow me on twitter or contact me if you have questions.

Postscript

This change reduced trigger executions and INSERT statement executions from ~350 million per day, to ~5 million.

Graphs like this make me happy. 🙂

Slow Garbage Collection

I encountered a curious issue recently, and immediately knew I needed to blog about it. Having already blogged about implicit conversions and how the TOP operator interacts with blocking operators, I found a problem that looked like the combination of the two.

I reviewed a garbage collection process that’s been in place for some time. The procedure populates a temp table with the key values for the table that is central to the GC. We use the temp table to delete from the related tables, then delete from the primary table. However, the query populating our temp table was taking far too long, 84 seconds when I tested it. We’re scanning and returning 1.4 million rows from the first table, doing a key lookup on all of them. We scan another table to look up the retention period for the related account, as this database has information from multiple accounts, and return 8.4 million rows there. We join two massive record sets, then have a filter operator that gets us down to a more reasonable size.

In general, when running a complex query you want your most effective filter, that results in the fewest rows, to happen first. We want to get the result set small early, because every operation after becomes less expensive. Here’s part of our anonymized query:

 WHERE
   Object5.Column10 = ?
   AND Object5.Column6 < Variable4 - COALESCE(Object12.Column1,Variable1,?)

So, Object5 is our main table. Column10 is a status column, and we compare that to a scalar value to make sure this record is ready for GC. That’s fine. The second is checking if the record is old enough to GC, and this is where things get interesting. The code here is odd; we’re subtracting a COALESCE of three values from a DATETIME variable, relying on behavior that subtracting an INT from a DATETIME subtracts that number of days from the DATETIME. There’s an implicit conversion, but it is on the variable side of the inequality. And the plan above appears to be our result.

So, Bad Date Math Code?

Seems like a good root cause, and I don’t like this lazy coding pattern, but let’s test it out with tables we can all look at. Would this be any better if we had written this to use the DATEADD function to do the math properly?


USE AdventureWorks2014
GO
--CREATE INDEX IX_SalesOrderHeader_ModifiedDate ON Sales.SalesOrderHeader (ModifiedDate);

SELECT TOP 100
	soh.ModifiedDate,
	soh.SalesOrderID
FROM Sales.SalesOrderHeader soh
WHERE
	soh.ModifiedDate < DATEADD(day, -1, GETUTCDATE());
	
SELECT TOP 100
	soh.ModifiedDate,
	soh.SalesOrderID
FROM Sales.SalesOrderHeader soh
WHERE
	soh.ModifiedDate < GETUTCDATE() - 1;

So, if we had an index on the OrderDate column, would it perform differently?

Apparently not. Same plan, same cost. But when you think about it, this tests the date math by subtracting an integer from the DATETIME provided by GETUTCDATE(). The original was subtracting a COALESCE of three values. One of those values was a float. Could the COALESCE or the resulting data type have made this more complicated?

Testing COALESCE


USE AdventureWorks2014
GO
DECLARE
	@AccoutSetting FLOAT = 8.0,
	@Default INT = 365;
	
SELECT TOP 100
	soh.ModifiedDate,
	soh.SalesOrderID
FROM Sales.SalesOrderHeader soh WITH(INDEX(IX_SalesOrderHeader_ModifiedDate))
WHERE
	soh.ModifiedDate < DATEADD(DAY, -COALESCE(@AccoutSetting, @Default), GETUTCDATE());

Running this, again we see a nice seek that reads and returns 100 rows. So the different data type and the COALESCE makes no difference.

Looking at the original query again, the first value isn’t a variable, it’s a column from a different table. We can’t filter by this column until we’ve read the other table, which affects our join order. But we have no criteria to seek the second table with.

Joined Filtering

One more test. Let’s see what the behavior looks like if we join to the Customers table to look for the RetentionPeriod. First, I’ll create and populate some data in that column:


USE AdventureWorks2014
GO
IF NOT EXISTS(
	SELECT 1
	FROM INFORMATION_SCHEMA.COLUMNS isc
	WHERE
		isc.TABLE_NAME = 'Customer'
		AND isc.TABLE_SCHEMA = 'Sales'
		AND isc.COLUMN_NAME = 'RetentionPeriod'
)
BEGIN
	ALTER TABLE Sales.Customer
		ADD RetentionPeriod INT NULL;
END;
GO
UPDATE TOP (100) sc 
	SET sc.RetentionPeriod = (sc.CustomerID%4+1)*90
FROM Sales.Customer sc;
GO
CREATE INDEX IX_Customer_RetentionPeriod ON Sales.Customer (RetentionPeriod);
GO

I only populated a few of the records to better match the production issue; only some accounts have the value I’m looking for, hence the COALESCE.


USE AdventureWorks2014
GO
DECLARE
	@Default INT = 365;

SELECT TOP 100
	soh.ModifiedDate,
	soh.SalesOrderID,
	sc.RetentionPeriod
FROM Sales.SalesOrderHeader soh
LEFT JOIN Sales.Customer sc
	ON sc.CustomerID = soh.CustomerID
	AND sc.RetentionPeriod IS NOT NULL
WHERE
	soh.ModifiedDate < DATEADD(DAY, -COALESCE(sc.RetentionPeriod, @Default), GETUTCDATE());

Now we’re trying to filter records in SalesOrderHeader, based on the Customer’s RetentionPeriod. How does this perform?

 

Well, the row counts aren’t terrible, but we are scanning both tables. The optimizer opted to start with Customer table, which is much smaller. We’re not filtering on the date until the filter operator, after the merge join.

I’d be worried that with a larger batch size or tables that don’t line up for a merge join, we’d just end up doing a hash match. That would force us to scan the first table, and without any filter criteria that would be a lot of reads.

Solution

The solution I applied to my production query was to create a temp table that had the account ID and the related retention period. I joined this to my primary table, and the query that was taking 84 seconds was replaced with one that took around 20 milliseconds.

 


USE AdventureWorks2014
GO
IF OBJECT_ID('tempdb..#AccountDates') IS NULL
BEGIN
	CREATE TABLE #AccountDates(
		CustomerID INT,
		GCDate DATETIME
	);
END
GO
DECLARE
	@Default INT = 365;

INSERT INTO #AccountDates
SELECT 
	sc.CustomerID,
	DATEADD(DAY, -COALESCE(sc.RetentionPeriod, @Default), GETUTCDATE()) AS GCDate
FROM Sales.Customer sc
WHERE
	sc.RetentionPeriod IS NOT NULL;

SELECT 
	soh.SalesOrderID
FROM #AccountDates ad
JOIN Sales.SalesOrderHeader soh
	ON soh.CustomerID = ad.CustomerID
	AND soh.ModifiedDate < ad.GCDate;

TRUNCATE TABLE #AccountDates;
GO

Here’s how I’d apply that thought to our example. I’ve created a temp table with CustomerID and its associated RetentionDate, so we could use that to search Sales.SalesOrderHeader.

You may have noticed my filter on the Sales.Customer table. In the live issue, the temp table had dozens of rows for dozens of accounts, not the tens of thousands I’d get from using all rows Sales.Customer in my example. So, I filtered it to get a similar size in my temp table.

Infinitely better. Literally in this case, since the ElapsedTime and ActualElapsedms indicators in the plan XML are all 0 milliseconds. I’m declaring victory.

If you liked this post, please follow me on twitter or contact me if you have questions.