Update and Correction:

This blog was originally posted on February 20. Since then I read other articles that suggested different behavior with the Halloween Problem. I contacted Paul White, who informed me that the WideWorldImporters database uses compatibility level 130 (SQL Server 2016) by default. So, I tested on a SQL Server 2019 instance but was probably seeing an issue addressed in later updates.

I tested again at compatibility level 150 and saw a different execution plan which led to different conclusions.

I’ve left the majority of the post unchanged, but I’m adding an addendum section, and updating the summary and its conclusions. So, make sure you read those sections for the corrections.

Original Post:

I find myself talking about the Halloween Problem a lot and wanted to fill in some more details on the subject. In short, the Halloween Problem is a case where an INSERT\UPDATE\DELETE\MERGE operates on a row more than once, or tries to and fails. In the first recorded case, an UPDATE changed multiple rows in the table more than once.

So let’s take a look at an example using a publicly available database, WideWorldImporters.

A Halloween Problem example

Here’s a simple update procedure. We’re going to update the quantity for an item in the Sales.OrderLines table:

CREATE OR ALTER PROCEDURE Sales.OrderLines_UpdateQuantity
	@OrderID INT,
	@StockItemID INT,
	@Quantity INT
	WITH EXECUTE AS OWNER
AS
BEGIN
	SET NOCOUNT ON;
	SET XACT_ABORT ON;

	UPDATE sol
	SET
		sol.Quantity = @Quantity,
		sol.PickedQuantity = @Quantity
	FROM Sales.OrderLines sol
	WHERE
		sol.OrderID = @OrderID
		AND sol.StockItemID = @StockItemID;
		-- AND sol.Quantity <> @Quantity;
END;
GO

You may notice the commented line. In one description of the Halloween Problem I heard\read, it was suggested that if we try to SET something that is in our WHERE clause the problem is likely to occur. Or rather, SQL Server will see the possibility of the problem and add protections to our execution plan to prevent it.

First, let’s test without that line, and see what our execution plan tells us.

EXEC Sales.OrderLines_UpdateQuantity
	@OrderID = 5,
	@StockItemID = 155,
	@Quantity = 21;
GO
Note the eager spool

The eager spool between our index reads and clustered index update shows that SQL Server added Halloween protections to prevent the problem. The problem is prevented by separating the read phase of the query from the write phase.

This usually involves a blocking operator. Most often this is an eager spool, but if there is another blocking operator in the plan like a sort or hash match, that blocking operator may remove the need for a separate spool.

The Halloween Problem would occur if a query is running in row mode and as rows are still being read, rows are being updated and moved in an index. This allows the read operation to potentially read the updated row again and operate on it again. The index movement is key in this scenario.

But with a blocking operator between the read operation and the write, we force all the reads to complete first. This gives us a complete, distinct list of rows to be updated (in this example) before we get to the clustered index update, so it isn’t possible to update the same row twice.

So, how does index movement come into play here? We are updating the Quantity and PickedQuantity columns in our UPDATE statement. Both fields are key columns in the only columnstore index on the table, NCCX_Sales_OrderLines.

CREATE NONCLUSTERED COLUMNSTORE INDEX [NCCX_Sales_OrderLines] ON [Sales].[OrderLines]
(
[OrderID],
[StockItemID],
[Description],
[Quantity],
[UnitPrice],
[PickedQuantity]
)WITH (DROP_EXISTING = OFF, COMPRESSION_DELAY = 0) ON [USERDATA];
GO

So when we update these columns, the affected rows will move in that index. If the row moves, that means a read operation could continue reading and find the same row again, returning it as a part of its result set a second time.

Interestingly, we aren’t reading from the columnstore index in the plan provided. Since that’s the only index containing these columns as key values, it’s the only index where the rows should move. In this case, our read operators shouldn’t encounter updated rows a second time, since they use the FK_Sales_OrderLines_OrderID (with a key lookup against PK_Sales_OrderLines).

I wonder if SQL Server decided the Halloween protections were needed before it decided which index it would use for the read.

Removing the index

Either way, if we dropped the NCCX_Sales_OrderLines index, we should see a plan without an eager spool between the read operators and the update operator.

IF EXISTS(
	SELECT 1
	FROM sys.indexes si
	WHERE
		si.name = 'NCCX_Sales_OrderLines'
)
BEGIN
	DROP INDEX [NCCX_Sales_OrderLines]
		ON Sales.OrderLines;
END;
GO

With the index removed, let’s look at the new plan.

Unspooled

We’ve lost the extra steps to the left of the clustered index update operator to update the columnstore index, and we have also lost the eager spool between the read operators and the update operator. This shows without the index movement, Halloween protections are no longer needed.

Performance impact of protections

Let’s look at the data from Query Store to see how big the difference is between the two execution plans.

I ran a simple query against the same OrderID in Sales.OrderLines before running the procedure before and after the index change to get the data into the cache (because cold cache issues were making a large difference). I also ran the procedure 10 times to try to average out our results in case any odd wait types were seen.

80 microseconds versus 46 microseconds. Blazing fast in both cases with the data already cached, but the plan with Halloween protections took 74% longer. Unsure if the update to a columnstore index is significantly more expensive than that of a rowstore index. Perhaps we should test this again without columnstore complicating the issue.

Speaking in general, I would expect a bigger difference in a query affecting more rows. For a query that only returns 3 rows from the first index seek, the delay caused by the spool would be very small. But imagine if we have a query that reads tens or hundreds of thousands of rows before performing its write operation.

Normally such a query would be passing rows it has read up to the join and update operators while it is continuing to read. Those operations would be happening on different threads in parallel.1

If we are being protected from the Halloween Problem, the eager spool will not return any rows to the operations above it (like the clustered index update) until all rows have been read. So the writes cannot start until much later, and the more rows being read the more considerable the delay.

Nonclustered indexes?

If you noticed the “+3 non-clustered indexes” banner in one of the plans above, that’s indicating the nonclustered indexes updated when we updated the clustered index. This is more obvious in Plan Explorer than in the plans as shown in SQL Server Management Studio. So, I wanted to point that out in case the visual was confusing to anyone.

But this raises another question. If we are updating those indexes, why don’t they cause the Halloween protections to be used?

That is because the quantity columns are present in those indexes only as included columns. Changes to those columns won’t affect where the row sorts, but the values still need to be updated.

Rowstore testing

So, let’s see how this looks with a rowstore index. Here’s a second procedure, similar to the first but also updating PickingCompletedWhen.

CREATE OR ALTER PROCEDURE Sales.OrderLines_UpdateQuantityWhen
	@OrderID INT,
	@StockItemID INT,
	@Quantity INT
	WITH EXECUTE AS OWNER
AS
BEGIN
	SET NOCOUNT ON;
	SET XACT_ABORT ON;

	UPDATE sol
	SET
		sol.Quantity = @Quantity,
		sol.PickedQuantity = @Quantity,
		sol.PickingCompletedWhen = GETUTCDATE()
	FROM Sales.OrderLines sol
	WHERE
		sol.OrderID = @OrderID
		AND sol.StockItemID = @StockItemID
		AND sol.PickingCompletedWhen < GETUTCDATE();
END;
GO

Initially, no index uses PickingCompletedWhen. So if we execute the procedure as is, we shouldn’t see the tell-tale eager spool.

EXEC Sales.OrderLines_UpdateQuantityWhen
	@OrderID = 5,
	@StockItemID = 155,
	@Quantity = 21;
GO

This plans is what we’d expect. If we add an index, how does this change the plan and how does this change the performance?

IF NOT EXISTS(
	SELECT 1
	FROM sys.indexes si
	WHERE
		si.name = 'IX_OrderLines_OrderID_StockItemID_PickingCompletedWhen'
)
BEGIN
	CREATE INDEX IX_OrderLines_OrderID_StockItemID_PickingCompletedWhen
		ON Sales.OrderLines (OrderID, StockItemID, PickingCompletedWhen);
END;
GO

Here, we see the eager spool implementing the Halloween protections again, but between the index seek and the key lookup. Note that the new index is the one we are using for the index seek. The clustered index update now indicates it is updating 4 nonclustered indexes, including the new index.

So, is the performance difference as stark as it was with the columnstore index?

So, 52 µs vs 118 µs. The query took about ~126% longer when the Halloween protections were present. More than we saw with the columnstore index, which is surprising. Perhaps it is relevant that we are updating a third field. It almost feels like the observer effect at this scale.

Addendum

So, to correct things here, let’s go back to the first procedure.

CREATE OR ALTER PROCEDURE Sales.OrderLines_UpdateQuantity
	@OrderID INT,
	@StockItemID INT,
	@Quantity INT
	WITH EXECUTE AS OWNER
AS
BEGIN
	SET NOCOUNT ON;
	SET XACT_ABORT ON;

	UPDATE sol
	SET
		sol.Quantity = @Quantity,
		sol.PickedQuantity = @Quantity
	FROM Sales.OrderLines sol
	WHERE
		sol.OrderID = @OrderID
		AND sol.StockItemID = @StockItemID;
		-- AND sol.Quantity <> @Quantity;
END;
GO

If I run this procedure again with the restored database and no other changes besides updating the compatibility level to 150, I see the following execution plan:

So, we have no eager spool, which means the Halloween Problem isn’t a problem now.

Previously, there was a spool between the index seek and lookup and the clustered index update. The only index using any of the updated fields as a key value was the columnstore index. This suggested that the optimizer will use Halloween protections if any index uses the updated fields as a key value because the rows would be moved in that index.

This new plan disproves that because the optimizer no longer uses the protections with the later compatibility level. And the columnstore index (NCCX_Sales_OrderLines) is still present (as you can see if you hover over the clustered index update operator).

As for the second procedure, I see the Halloween protections even without the index I added in my example. Without that index, the query originally used the FK_Sales_OrderLines_OrderID index to seek the rows in question. At the higher compatibility level, the IX_Sales_OrderLines_Perf_20160301_02 index is used, which is keyed on (StockItemID, PickingCompletedWhen).

So, the Halloween protections are used because we read from an index keyed on one of the updated fields, and rows being updated will potentially move in that index.

We’ve seen the Halloween protections when using nonclustered indexes so far, but what if we are using the clustered index for the read?

I wrote a quick procedure to change the OrderLineID, which is the only column in the clustered primary key for this table. And this matches expectations; we see the eager spool between the clustered index seek and the update operator.

Summary

Hopefully, the addendum corrects the matter while keeping things clear. I’m updating one of the bullet points below, as well.

It seems there are only two criteria for the protections against the Halloween Problem to be used for an UPDATE query:

  1. The object being updated must also be in the query.
  2. A column being updated must be a key column in at least one index on the table. One of the updated columns must be a key value in the index used for the read portion of the query, so that the rows may move in that index.

For other statements, the setup is more complex. I find the UPDATE statement is the most straightforward example of the Halloween Problem. But you can see the protections in place if you query from a table as part of an INSERT or DELETE (or MERGE) where you change that same table.

And if we see Halloween protections in the plan for a query, we could change the offending index or the query to change the behavior.

Or we could use the manual Halloween technique, which I will discuss next time.

Thanks again to Paul White for pointing out the compatibility level; I doubt that would ever have occurred to me.

Please contact me if you have any questions or comments. I’ve updated my social media links above to include Counter.Social and Mastodon. We’ll see if there is more #sqlfamily activity on those platforms going forward.

Footnotes

1: Not the type of parallelism we typically think of with SQL Server. Parallelism is typically when a given operation, like an index scan, is expected to process many rows, and SQL Server dedicates multiple threads to that operator or group of operators. In this case, I say parallel because different operators (the index seek, nested loops join, and clustered index update) are all processing rows at the same time, one row at a time.

Tempdb contention has long been an issue in SQL Server, and there are many blogs on the issue already. But I wanted to add one more mainly to highlight the improvements in recent versions of SQL Server

Tempdb contention is most often discussed in as relating to the creation of temp tables (and other objects) in tempdb. If you are experiencing this you will see PAGELATCH_EX or PAGELATCH_SH waits, frequently with wait resources like 2:1:1 or 2:1:3. This indicates contention in database 2 (tempdb), page 1 (the first data file in tempdb), and one of the PFS, GAM, or SGAM pages (which are pages 1, 2, and 3 respectively). Tempdb files of sufficient size will have additional PFS, GAM, and SGAM pages at higher page numbers, but 1 and 3 are the pages most often referenced.

Temp tables aren’t the only objects being created in tempdb. Table variables are as well unless they are memory-optimized. There are also worktables for sorts, spools, and cursors. Hash operations can spill to disk and are written into tempdb. Row versions are written into tempdb for things like read committed snapshot isolation, and triggers make use of row versioning as well. For more details, check out this excellent post by David Pless.

Before recent releases, there were three main suggestions for reducing tempdb contention.

  • Trace flags (1118 and 1117)
  • More tempdb files
  • Create fewer objects in tempdb

Honestly, I don’t think the third was even included in a lot of the blogs on the subject, and it is very important. Many of the actions that use tempdb can’t be avoided, but I tend to use memory-optimized table variables instead of temp tables the vast majority of the time.

In one case a few years ago, I replaced the memory-optimized table variables in one very frequently executed stored procedure with temp tables to see if using temp tables would result in better execution plans. This procedure was executed about 300 million times per day across several SQL Server instances using similar databases, and the procedure used 4 temp tables. The plans didn’t matter; creating 1.2 billion more temp tables per day added far too much tempdb contention.

But the main point of this post is to help everyone catch up on the topic, and see how more recent versions of SQL Server improve on this issue.

Improvements in SQL Server 2016

SQL Server 2016 introduced several improvements that help reduce tempdb contention.

The most obvious is that setup will create multiple files by default, one per logical server up to eight. That bakes in one of the main recommendations for reducing tempdb contention, so it’s a welcome improvement.

There are also behavior changes that include the behavior of trace flags 1117 and 1118. All tempdb data files grow at the same time and by the same amount by default, which removes the need for trace flag 1117. And all tempdb allocations use uniform extents instead of mixed extents, removing the need for trace flag 1118.

So, that’s another recommendation for reducing tempdb contention already in place.

Several other changes also improve caching (reducing page latch and metadata contention), reduce the logging for tempdb operation, and reduce the usage of update locks.
For the full list, check here.

Improvements in SQL Server 2019

The big change here is the introduction of memory-optimized tempdb metadata. The documentation here says that this change (which is not enabled by default, you will need to run an ALTER SERVER CONFIGURATION statement and restart) “effectively removes” the bottleneck from tempdb metadata contention.

However, this post by Marisa Mathews indicates the memory-optimized tempdb metadata improvement in SQL Server 2019 removed most contention in PFS pages caused by concurrent updates. This is done by allowing the updates to occur with a shared latch (see the “Concurrent PFS updates” entry here).

Tempdb contention seen in sp_WhoIsActive output

One thing I would point out is that the metadata is being optimized here; the temp tables you create are not memory-optimized and will still be written to the storage under tempdb as usual.

Improvements in SQL Server 2022

The post above also indicates that SQL Server 2022 reduces contention in the GAM and SGAM pages by allowing these pages to be updated with a shared lock rather than an update log.

The issue with the PFS, GAM, and SGAM pages has always been the need for an exclusive latch on those pages when an allocation takes place. If 20 threads are trying to create a temp table, 19 of them get to wait. The suggestion to add more data files to tempdb was a way to get around access to these pages being serialized; adding more files gives you more of these pages to spread the allocation operations across.

In Summary

The gist is that tempdb contention has been nearly eliminated in SQL Server 2022. There are still several other actions that use tempdb, and you may see contention if have a niche workload or use a lot of worktables.

Hopefully, this post will help you decide if it’s time for an upgrade. If you have been seeing tempdb contention on these common pages, the latest release should be a major improvement.

Feel free to contact me with any questions or let me know of any suggestions you may have for a post.

Wanted to point out a few more good articles and a video on the subject that you may enjoy.

This will be an unusual post for me. I normally just write technical posts about performance issues I’ve been working on. But I did go to PASS Summit this year, for only the second time in person, and the social aspect was a big part of why I went.

I’m writing this largely because of a blog post by Paul Randal. I replied in the comments but wanted to relate more of my own experience with introversion and PASS Summit.

So, networking…

I am an introvert, and I’ve never felt networking was my strong suit. Long conversations with people I don’t know aren’t something I crave, but networking can drastically increase our options career-wise. That’s how I got my current job.

Several years back I was working on a contract that had just been extended. The project I was working on had been completed, and I wasn’t sure I wanted to remain with that company long-term. I had been to meetings with the local user group over the years and spoken a few times, but hadn’t been to a meeting in months. I decided to check back in and be sociable.

So, I went to the next meeting and during the intro, the leader, Kevin Feasel, said something interesting. His company was looking for a DBE, and to speak with him after if anyone was interested. That conversation led to me applying, and one interview later I had an offer.

It’s honestly ridiculous looking back on it. It feels like I put in a minimal amount of effort but got a great opportunity with a very unique company. This was about a year after leaving Microsoft CSS (customer support), which allowed me to see a huge variety of SQL Server issues. ChannelAdvisor has a very busy OLTP environment and had great database engineers already working for it. Working on the performance issues here has been enlightening.

So, when I submitted a talk for PASS Summit this year, I had to choose whether to apply for an in-person or online session. Since I’ve been hiding out from the coronavirus like many people for the last two years, I decided to try for an in-person session at the summit and spend the extra effort to do some networking while I was there.

Nice View

At the summit

I’ve been to the PASS Summit once before in Seattle. There are a ton of great sessions, and I usually want to be in three different rooms for each session. I went to sessions all day, walked around Seattle a bit each evening, and crashed. It’s surprisingly exhausting to learn all day.

During this trip, I wanted to focus more on networking. I still went to a lot of sessions, but there were some I skipped to have conversations in the exhibitor lounge and elsewhere in the convention center. And if I was in a session that wasn’t what I was looking for, I was more likely to leave and just look for a chance to talk.

I ended up talking with about a dozen well-known people in the community. Some I had already interacted with; others not at all.

And it wasn’t comfortable or easy most of the time. As I said earlier, I’m an introvert. I had to make myself engage with people I had little or no contact with before. Several times I walked past a conversation before I made myself turn around and get involved.

I was mainly interested in talking with engineers doing consulting work about the work they do. There are similarities to the work I did in Microsoft CSS previously, so I’ve been interested in that part of the business.

I had some contact with a few speakers through work. I had replied to a few on a topic or two on Twitter. A few I had spoken with because I presented for a user group they were part of. I emailed Paul Randal several times through the mentorship he offered a year or two ago. Some like Brent Ozar I had never had any contact with, but he was more than willing to talk. I also had a few good conversations with consultants in the sponsor’s lounge.

What surprised me was hearing a lot (or all?) of the presenters I talked with mention that they were introverts during their talks. It seems unbelievable for someone presenting to a ballroom or keynote with seeming ease.

My perspective

I referenced Paul Randal’s post because part of the feedback he received was from people who found the experience alienating. Some didn’t feel the community was very welcoming or felt like they were on the outside looking in or saw it as being very cliquish.

I wanted to post not to negate any of that; our feelings are always valid. But that doesn’t mean others are trying to be exclusionary.

And I did feel like I was on the outside looking in. And it was difficult\stressful to start up a conversation with people or join in one that was already going on. I’ve spoken at PASS Summit or Virtual PASS three times, but I’m not nearly as well known as many of the other presenters. So I certainly don’t feel I’m part of any clique.

But it did strike me that so many of the presenters had mentioned that they were introverts. So if people see the more well-known speakers gravitate toward people they already know, that’s what I’d expect an introvert to do.

I certainly did the same when I was feeling a bit overwhelmed and saw a friendly face.

In all of the conversations I joined in with or started, I didn’t have anyone turn me away. Once I engaged, I found everyone to be willing to answer questions, happy to give advice and giving of their time.

So as much as anything, I’d like to encourage people to be bold and engage. It may not be easy to start these contacts, but the results have been well worth the discomfort.

Stepping stones

One thing that did make this easier was something I didn’t intend. Over the last few years, I’ve presented at PASS Summit, several SQL Saturday events, and several user groups as well. So I’ve had more interactions with people in our community over the last few years as a lead-up to the summit this year.

I didn’t present at those user groups and SQL Saturday events to make networking easier, but it does help when I can say to someone, “Hi, I presented online for your user group a few months ago, and I wanted to ask you a question.”

This is not to say that you have to present to be part of the community, but any way you can engage with the community gives you more connections and more opportunities going forward.

And I should specifically thank Tracy Boggiano because many of those speaking opportunities came from her pinging me when she heard of a user group that needs a speaker. So, thank you, Tracy!

About the community…

Before and after the summit, I heard a lot of people in the community express a desire to leave Twitter. And I get why, but it’s been the main place where #sqlfamily posted. It isn’t clear how much of the community is looking to move, but I thought I’d mention the options I’ve seen people refer to:

  • CounterSocial: I saw people start talking about this before the summit. I haven’t seen a ton of activity here yet, but there are several names I recognized that have joined. Perhaps the chatter will pick up after the holidays, so try looking for #sqlfamily there.
  • Mastodon: I heard more about Mastodon at and after the summit. This works a bit differently, not being entirely centralized. People have to set up a server for it, and it seems communities have to find the right server (don’t ask me for the details). But you can search for #sqlfamily there, or go to the server that Daniel Hutmacher set up at https://dataplatform.social/home. Not a lot of activity there yet, but maybe this is where things will reform.
  • Linkedin: This wouldn’t have occurred to me, but Brent Ozar pointed to LinkedIn (and TikTok and Instagram) when he said he would stop tweeting. There’s some activity here, and I’ve linked my new blog posts here in the past.

So, if you are looking for people to follow or new events or groups to be part of here, there are three good places to start.

My LinkedIn should already be at the top of the page, but I have accounts on CounterSocial (https://counter.social/@sqljared) and Mastodon (https://dataplatform.social/@sqljared), and I’ll link those at the top of the page shortly.

In Summary

I hope I’ve encouraged you to engage with the community in the new year, and I hope you will find it rewarding. I’m going to do the same myself and see if there are some more user groups that need a remote present.

Happy New Year!

I’ve discussed the other two join types, so what is the niche for the third?

Before we get into how it works and what my experience is I want to mention a response to my last blog, because it leads into our topic.

Addendum on Hash Match Joins

My last blog post was on hash match joins, and Kevin Feasel had a response on his blog.

Hash matches aren’t inefficient; they are the best way to join large result sets together. The caveat is that you have a large result set, and that itself may not be optimal. Should it be returning this many rows? Have you included all the filters you can? Are you returning columns you don’t need?

Jared Poche

I might throw in one caveat about hash match joins and being the best performers for two really large datasets joining together: merge join can be more efficient so long as both sets are guaranteed to be ordered in the same way without an explicit sort operator. That last clause is usually the kicker.

Kevin Feasel, Curated SQL

And he is quite correct. Nested loops perform better than hash match with smaller result sets, and hash match performs better on large result sets.

Merge joins should be more efficient than both when the two sources are sorted in the same order. So merge joins can be great, but the caveat is that you will rarely have two sources that are already sorted in the same order. So if you were looking for the tldr version of this blog, this paragraph is it.

How Merge Joins Operate

Merge joins traverse both inputs once, advancing a row at a time and comparing the values from each input. Since they are in the same order, this is very efficient. We don’t have to pay the cost to create a hash table, and we don’t have the much larger number of index seeks nested loops would encounter.

The process flows like this:

  1. Compare the current values from each data source.
  2. If they match, add the joined row to the result set, and get the next value from both sources.
  3. If not, get the next row from the data source with the lower sorted value.
  4. If there are no more rows from either source, the operation ends.
  5. Otherwise, return to step 1 with the new input.

At this point, I would create a great visual for this, but one already exists. So let me refer you a post by Bert Wagner. The video has a great visualization of the process

Input Independence

I find nested loops is probably the easiest join to understand, so I want to draw a distinction here. Using nested loops, we would get a row from the first source then seek the index against the second to get all rows related to the row from the first source. So, our ability to seek from the second depends on the first.

A merge join seeks from both independently, taking in rows and comparing them in order. So in addition to the requirement (with exception) that the sources have to be in the same order, we need a filter we can use for each source. The ON clause does not give us the filter for the second table, we need something else.

Here’s an example query and plan:

USE WideWorldImporters
GO
SELECT 
	inv.InvoiceID,
	invl.InvoiceLineID
FROM Sales.Invoices inv
INNER JOIN Sales.InvoiceLines invl
	ON invl.InvoiceID = inv.InvoiceID
WHERE
	inv.InvoiceID < 50;
GO

Both Invoices and InvoiceLines have indexes based on InvoiceID, so the data should already be in order. So this should be a good case for a merge (the nested loops below is because of the key lookup on InvoiceLines). But SQL Server’s optimizer still chose nested loops for this query.

I can hint it to get the behavior I expected, and that plan is below.

The estimates are way off for the Invoices table, which is odd considering we are seeking on the primary key’s only column; one would expect that estimate to be more accurate. But this estimate causes the cost for the seek against Invoices to be more expensive, so the optimizer chose the other plan. It makes sense.

I updated the statistics, and a different plan was chosen. One with a hash match.

???

In that case, the difference in cost was directly the cost of the join operator itself; the cost of the merge join operator was 3x the cost of the hash match operator.

Even if the merge is more efficient, it seems it’s being estimated as being more costly, and specifically for CPU cost. You’re likely to see merge joins much less often than the other two types because of the sort requirement; how it is estimated may also be a factor.

About that sort

The first several times I saw a merge join in an execution plan, the merge was basically the problem with the query. It gave me the impression at the time that merge joins aren’t great in general. But in those cases, the execution plan had a sort after one of the index operations and before the join. Sure, the merge join requires that the two sources be sorted in the same order, but SQL Server could always use a sort operator (expensive as they are) to make that an option.

This seems like an odd choice to make, so let’s consider the following query:

USE WideWorldImporters
GO
SELECT *
FROM Sales.Invoices inv
INNER JOIN Sales.InvoiceLines invl
	ON invl.InvoiceID = inv.InvoiceID
WHERE
	inv.InvoiceDate < DATEADD(month, -12, getutcdate());
GO

So, this query does a merge join between the two, but there is a sort on the second input. We scan the index, then sort the data to match the other import before we perform the actual join. A sort operator is going to be a large cost to add into our execution plan, so why did the optimizer choose this plan?

This is a bad query, and the optimizer is trying to create a good plan for it. This may explain many other situations where I have seen a sorted merge. The query is joining the two tables on InvoiceID, and the only filter is on Invoices.InvoiceDate. There is no index on Invoices.InvoiceDate, so it’s a given we’ll scan that table.

If this query used nested loops, we could use the InvoiceID for each record from Invoices to seek a useful index against InvoiceLines, but that would mean we perform 151,578 seeks against that table.

A merge join, even if we have to sort the results from the table, would allow us to perform one index operation instead. But a merge join has to seek independently from the other source, and no other filter is available. So we perform an index scan against the second table as well.

This is probably the best among poor options. To really improve this query, you’d need to add an index or change the WHERE clause.

It took some time for me to realize why I most often saw merge joins in poor execution plans; I wasn’t seeing all the plans using them that perform well. If you are troubleshooting a high CPU situation, when you find the cause you’ll likely be looking at bad plan. We don’t tend to look for the best performing query on the server, do we?

So, if merge join is more efficient than the other two join types in general, we are less likely to be looking at queries where it is being used effectively.

Summary

Hopefully I’ll be getting back to a more regular schedule for the blog. There’s been a number of distractions (an estate sale, mice, etc), but life has been more calm of late (mercifully).

I spoke at the two PASS Summit virtual events over the last two years, and this year I am happy to be presenting in person at PASS Data Community SUMMIT for the first time. So if you are interested in how you can use memory-optimized table variables to improve performance on your system, look out for that session.

When I began working at Microsoft, I was very much a novice at performance troubleshooting. There was a lot to learn, and hash match joins were pointed out to me multiple times as the potential cause for a given issue. So, for a while I had it in my head, “hash match == bad”. But this really isn’t the case.

Hash matches aren’t inefficient; they are the best way to join large result sets together. The caveat is that you have a large result set, and that itself may not be optimal. Should it be returning this many rows? Have you included all the filters you can? Are you returning columns you don’t need?

If SQL Server is using a hash match operator, it could be a sign that the optimizer is estimating a large result set incorrectly. If the estimates are far off from the actual number of rows, you likely need to update statistics.

Let’s look at how the join operates so we can understand how this differs from nested loops

How Hash Match Joins Operate

Build Input

A hash match join between two tables or result sets starts by creating a hash table. The first input is the build input. As the process reads from the build input, it calculates a hash value for each row in the input and stores them in the correct bucket in the hash table.

Creating the hash table is resource intensive. This is efficient in the long run, but is too much overhead when a small number of rows are involved. In that case, we’re better off with another join, likely nested loops.

If the hash table created is larger than the memory allocation allows, it will “spill” the rest of the table into tempdb. This allows the operation to continue, but isn’t great for performance. We’d rather be reading this out of memory than from tempdb.

The building of the hash table is a blocking operator. This means the normal row mode operation we expect isn’t happening here. We won’t read anything from the second input until we have read all matching rows from the build input and created the hash table. In the query above, our build input is the result of all the operators highlighted in yellow.

Probe Input

Once that is complete, we move on to the second input in the probe phase. Here’s the query I used for the plan above:

USE WideWorldImporters
GO

SELECT *
FROM Sales.Invoices inv
INNER JOIN Sales.InvoiceLines invl
	ON invl.InvoiceID = inv.InvoiceID
WHERE
	inv.AccountsPersonID = 3002
GO

The build input performed an index seek and key lookup against Sales.Invoices. That’s what the hash table is built on. You can see from the image above that this plan performs a scan against Sales.InvoiceLines. Not great, but let’s look at the details.

There is no predicate or seek predicate, and we are doing a scan. This seems odd if you understand nested loops, because we are joining based on InvoiceID, and there is an index on InvoiceID for this table. But the hash match join operated differently, and doesn’t iterate the rows based on the provided join criteria. The seek\scan against the second table has to happen independently, then we probe the hash table with the data it returns.

If the read against Sales.InvoiceLines table can’t seek based on the join criteria, then we have no filter. We scan the table, reading 490,238 rows. Also unlike a nested loop join, we perform that operation once.

There is a filter operator before the hash match operator. For each row we read of Sales.InvoiceLines, we create a hash value, and check against the hash table for a match. The filter operator reduces our results from 490,238 rows to 751, but doesn’t change the fact that we had to read 490,238 rows to start with.

In the case of this query, I’d want to see if there’s a filter I can apply to the second table. Even if it doesn’t change our join type away from a hash match, if we performed a seek to get the data initially from the second table, it would make a huge difference.

Remember Blocking Operators?

I mentioned the build input turns that branch of our execution plan into a blocking operator. This is something try to call out the normal flow of row mode execution.

With a nested loops join, we would be getting an individual row from the first source, and doing the matching lookup on the second source, and joining those rows before the join operator asked the first source for another row.

Here, our hash match join has to gather all rows from the first source (which here includes the index seek, key lookup, and nested loops join) before we build our hash table. This could significantly affect a query with a TOP clause.

The TOP clause stops the query requesting new rows from the operators underneath it once it has met it’s requirement. This should result in reading less data, but a blocking operator forces us to read all applicable rows first, before we return anything to upstream operators.

So if your TOP query is trying to read a small number of rows but the plan has a hash match in it, you will be likely reading more data that you would with nested loops.

Summary

Actual numbers comparing join types would depend a ton on the examples. Nested loops are better for smaller result sets, but if you are expecting several thousand (maybe ten or more) rows read from a table, hash match may be more efficient. Hash matches are more efficient in CPU usage and logical reads as the data size increases.

I’ll be speaking at some user groups and other events over the next few months, but more posts are coming.

As always, I’m open to suggestions on topics for a blog, given that I blog mainly on performance topics. You can follow me on twitter (@sqljared) and contact me if you have questions. You can also subscribe on the right side of this page to get notified when I post.

Have a good night.

There are a lot of things to know to understand execution plans and how they operate. One of the essentials is understanding joins, so I wanted to post about each of them in turn.

The three basic types of join operators are hatch match, merge, and nested loops. In this post, I’m going to focus on the last, and I will post on the other two shortly.

How Nested Loops Operate

Nested loops joins are the join operator you are likely to see the most often. It tends to operate best on smaller data sets, especially when the first of the two tables being joined has a small data set.

In row mode, the first table returns rows one at a time to the join operator. The join operator then performs a seek\scan against the second table for each row passed in from the first table. It searches that table based on the data provided by the first table, and the columns defined in our ON or WHERE clauses.

  • You can’t search the second table independently; you have to have the data from the first table. This is very different for a merge join.
  • A merge join will independently seek or scan two tables, then joins those results. It is very efficient if the results are in the same order so they can be compared more easily.
  • A hash join will seek or scan the second table however it can, then probes the hash table with values from the second table based on the ON clause.

So, if the first table returns 1,000 rows, we won’t perform an index seek (or scan) against the second; we will perform 1,000 seeks (or scans) against that index, looking for any rows that relate to the row from the first table.

The query above joins several table, using nested loops between each. we can see that the row counts for the first several tables are all 1. We read one SalesPerson record, the related SalesTerritory, an Employee record, and a Person record. When we join that to the SalesOrderHeader table, we find 234 related rows. That comes from 1 index seek, as internal result set only had 1 row thus far. If we join to LineItem next, we would perform that seek 234 times.

Key Lookups

The optimizer always uses nested loops for key lookups, so you’ll see them frequently for this purpose. I was unsure if this was the only way key lookups are implemented, but this post from Erik Darling confirms it.

In the plan above, we return 149 rows from OrderLines table. We use a nested loops join operator so we can include the UnitPrice and Description in our output, since they aren’t in the nonclustered index.

Which means we execute the key lookup 149 times. The cost for this operator is 99% of the total query, but the optimizer overestimated how many rows we would return.

I frequently look at key lookups as an opportunity for tuning. If you look at the output for that operator, you can see which columns are causing the key lookup. You can either add those columns to the index you expect this query to use (as included columns), or you can ask whether those columns really need to be included in the query. Either approach will remove the key lookup and the nested loop.

LOOP JOIN hints

You can direct SQL Server to use nested loops specifically in your query by writing the query with (INNER\LEFT\RIGHT\FULL) LOOP JOIN. This has two effects:

  • Forces the optimizer to use the specified join type. This will only force the join type for that specific join.
  • Sets the join order. This forces the tables to be joined in the order they are written (subqueries from WHERE EXISTS clauses are excluded from this). So this point may affect how the others joins operate.

I’ve blogged about using hints previously, so I won’t go on for long on this subject. I like the phrase “With great power comes great responsibility” when thinking about using hints to get a specific behavior. It can be an effective way to get consistent behavior, but you can make things worse if you don’t test and follow up to confirm it works as intended.

Summary

I’ll discuss the other two join types in another post. In short, hash matches are more efficient for large data sets, but the presence of a large data set should make us ask other questions. Merge joins are very efficient when dealing with data from two sources that are already in the same order, which is unlikely in general.

Thanks to everyone who voted for my session in GroupBy. I enjoyed speaking at the event, and had some interesting discussion and questions. I expect recordings for the May event will be available on GroupBy’s Youtube page, so keep an eye out for that if you missed the event.

As always, I’m open to suggestions on topics for a blog, given that I blog mainly on performance topics. You can follow me on twitter (@sqljared) and contact me if you have questions. You can also subscribe on the right side of this page to get notified when I post.

Hope you are all enjoying a long weekend.

In my most recent blog post, I looked at the Query Variation report, which compares the recent performance of queries versus their historical performance to either highlight improvements or regressions in performance. The Waits Variation component does the same, but comparing the recent waits for a query to its historical waits.

One thing to keep in mind, is that if a given query is changed in any way (to change the filter, return additional columns, or include a hint), the changed query will have a different query_id in Query Store. In which case, both the Waits Variation and Query Variation procedures will not compare the historical performance of the old query to the recent performance of the new one.

That being said, let’s look at the dbo.WaitsVariation procedure’s options.

Waits Variation Options

  • @ServerIdentifier: Defaults to the current instance. Set this to gather data from another instance of SQL Server.
  • @DatabaseName: Defaults to the current database. I’m querying data from WideWorldImporters for my examples.
  • @ReportIndex: Default NULL. When used, stores information about the parameters used for the report.
  • @ReportTable: Default NULL. Allows you to store the report data in a table like dbo.WaitsVariationStore, created by the installer.
  • @WaitType: Which wait category are we concerned with? Total by default. [Total, CPU, Lock, Latch, BufferLatch, BufferIO, WorkerThread, NetworkIO, Parallelism…] There are 36 options here, so check the header of dbo.WaitsVariation for the full list.
  • @Metric: Do we want to compare the total wait time for the query, or the average? Avg by default. [Avg, Total]
  • @VariationType: Are we looking for queries that have improved [I] or regressed [R]. ‘R’ by default.
  • @ResultsRowCount: Number of rows to include in the report. Only the largest regressions or improvements are included. Default is 25, and if this value is NULL or negative, the procedure uses the default.
  • @RecentStartTime and @RecentEndTime: Defines what the ‘recent’ period is, which will be compared against the historical. The defaults are 1 hour ago for @RecentStartTime, and now for @RecentEndTime.
  • @HistoryStartTime and @HistoryEndTime: Defines what the ‘history’ period is, which will be compared against the recent. The defaults are 30 days ago for @HistoryStartTime, and 1 hour ago for @HistoryEndTime.
  • @IncludeQueryText: Includes the text for any identified queries in the output. Default is 0.
  • @ExcludeAdhoc: Ignores ad hoc queries, anything that isn’t part of a procedure or other defined object. Default is 0.
  • @ExcludeInternal: Excludes internal queries\operations run by SQL Server itself. Default is 0.
  • @VerboseMode: Default 0. Provides the queries being used in the messages tab.
  • @TestMode: Default 0. When enabled, does everything except actually run the generated queries to create the report.
  • OUTPUT @ReportID: When you @ReportIndex and @ReportTable, a saved report is generated. You can then use the @ReportID to see that report separately.

So, these parameters are very similar to dbo.QueryVariation. You’ll likely want to alter your time parameters, and you could generate multiple reports to check different wait types. Or you could do a regular report for total waits, and review to see which type of wait had the biggest change.

Best Total Wait Improvement

Given that the default is to look for regressions instead of improvements, if I wanted to compare the last two day’s waits to the previous month, I could use the following:

DECLARE @RecentStartTime	DATETIME2 = DATEADD(HOUR, -48, GETUTCDATE());
DECLARE @RecentEndTime	DATETIME2 = GETUTCDATE();
DECLARE @HistoryStartTime	DATETIME2 = DATEADD(DAY, -30, @RecentEndTime);
DECLARE @HistoryEndTime	DATETIME2 = @RecentStartTime;

	EXECUTE [dbo].[WaitsVariation]
		@DatabaseName	= 'WideWorldImporters',
		@WaitType	= 'Total',
		@Metric		= 'Avg',
		@VariationType = 'I',
		@ExcludeAdhoc = 1,
		@HistoryStartTime = @HistoryStartTime,
		@HistoryEndTime	= @HistoryEndTime,
		@RecentStartTime = @RecentStartTime,
		@RecentEndTime = @RecentEndTime;

Choosing ‘Total’ as your @Metric may work for some cases, but would tend to give results with large variation. If a query executed more times in one period than the other, I would expect that period to have more waits.

I ran a workload to generate some activity, and only one query in it regressed, so I’m returning improvements so there’s a little more to look at.

Wait Variation – Improvements

So, the four procs listed had some rather dramatic improvements. We can see the amount of the total wait reduction, and columns for unknown and CPU waits. This is only a portion of the incredibly wide result set. There are history, recent, and variation % columns for each wait type.

We ran this script looking for ‘Total’ as our @WaitType, which means we are sorting and returning the top rows with the most improvement in that area. But it did record the improvement for all wait types. It only takes a little scrolling to find our causes:

Wait Variation – Causes

So, the first procedure had relatively large parallelism waits in its history, compared with none in the recent period. I would argue the variation % should be 100% not 0%, but the logic here may be special to prevent us from diving by 0.

All four procedures had NetworkIO waits. It’s likely the tool I was using to run these queries in a loop was not consuming the results fast enough, causing delays within SQL Server. These queries were taking so long that I used a smaller date range on my more recent calls. Fewer records, less time for the app to consume them, and smaller NetworkIO waits.

One thing I have noticed is that if you use @IncludeQueryText, the text returned is compressed and unreadable. Unless you create a saved report by using the @ReportIndex and @ReportTable options, which leads us to the second procedure.

Waits Variation from Store

If I save the results, there are records in the tables indicated, I can run dbo.WaitsVariationFromStore to read out the header of the report and its details in two result sets.

Waits Variation from Store output

The result set is much more limited, but focuses on the metric we chose. We can see the number of executions, the related object, the change in our totals waits, and the decompressed text for the query itself.

There are only three parameters for this proc, but here’s an example:

USE QDSToolBox
GO

	EXECUTE dbo.WaitsVariationFromStore
		@ReportID = 1,
		@VerboseMode = 0,
		@TestMode = 0;

Nearly done

There are two more components to the QDS Dashboard I want to detail, but I may be spreading them out between some other topics I want to blog on.

Time to go update slides for a user group meeting later this week.

If you have any topics related to performance in SQL Server you would like to hear more about, please feel free to make a suggestion. You can follow me on twitter (@sqljared) and contact me if you have questions.

In a recent talk with the Sacramento SQL Server User Group (@SACSQLUG), a question was asked I wanted to address, and the answer seemed more like a short blog than a tweet.

I was presenting on how to use the TOP clause to break down large operations into short, fast, bite-sized operations. The mechanics are things I learned from writing processes that do garbage collection, backfill new columns, and anonymizing PII data on existing tables. I’ve just posted the slides and example scripts here if you are interested.

Are they the same?

The question was whether the SET ROWCOUNT command would work just the same, and the answer is sometimes yes but largely no.

SET ROWCOUNT will limit the number of rows returned by the following statements, and this functions like a connection level setting. It will remain in effect until you change it or clear it by setting the rowcount to 0 (unlimited).

I would be concerned about setting the option and not clearing it, causing it to affect additional statements.

So let’s adapt some of my examples and see how it behaves.

The simple case with SET ROWCOUNT

Here’s a script to return 100 rows from a table in WideWorldImporters.

USE WideWorldImporters
GO
SELECT TOP 100 
	sol.OrderLineID, 
	sol.UnitPrice 
FROM Sales.OrderLines sol 
WHERE sol.OrderLineID < 1000 
	AND sol.UnitPrice > 50;
GO

SET ROWCOUNT 100;

SELECT 
	sol.OrderLineID, 
	sol.UnitPrice 
FROM Sales.OrderLines sol 
WHERE sol.OrderLineID < 1000 
	AND sol.UnitPrice > 50;
GO

Fairly simple, and the plans are much the same.

100 rows, as promised

The only difference is that the TOP operator is missing. But the query is behaving much like it is present. I surmise that the SELECT operator itself is limiting the number of rows by not requesting additional data after the first 100.

Quick aside on Row Mode

SQL Server queries typically operate in row mode. Batch mode is used in some cases with columnstore indexes and can be used on rowstore indexes in SQL Server 2019 and above. But I want to focus on how a query processes in row mode (with thanks to Grant Fritchey for his description of this).

The activity is driven by the root node, in this latest plan the Select operator. It asks the operator under it (the Top operator) to go get a row. The Select operator doesn’t really care how that operator does what it does, it just wants a row. And for the Top operator to do it’s job, it asks the operator below it for the same.

Until we get down to the last Nested Loops operator. It asks the Index Scan operator for a row, and that operator can finally get us some data. It returns a row and, being a join operator, Nested Loops asks the Key Lookup for data related to its row. It joins the results and pass the joined row up to the operator above it.

When data flows up to an operator, it will typically reply, “Now give me another row.”

The Top operator changes this be simply not asking for more rows once it has received the number of rows it needs. So all the activity for the underlying operators stops because the Top hasn’t told it to continue.

In the simple example of my two queries, we can see the Clustered Index Seek only returned 100 rows, then the query ended. It seems with SET ROWCOUNT in effect, the Select operator itself stopped the operation by not requesting more data.

But then it falls apart

Here’s a stored procedure I wrote to delete old records from the Sales.Orders table and all related tables.

USE WideWorldImporters
GO
CREATE OR ALTER PROCEDURE Sales.Orders_GarbageCollection(
@BatchSize INT = 100,
@Duration INT = 30
)AS
SET NOCOUNT ON;

	DECLARE 
		@NestedTransaction BIT = 0,
		@EndTime DATETIME;

	CREATE TABLE #OrdersGC(
		OrderID INT,
		INDEX IX_OrdersGC(OrderID)
	);

	CREATE TABLE #InvoicesGC(
		InvoiceID INT,
		INDEX IX_InvoicesGC(InvoiceID)
	);

	SET @EndTime = DATEADD(SECOND, @Duration, GETUTCDATE());

	WHILE (GETUTCDATE() < @EndTime)
	BEGIN
		TRUNCATE TABLE #OrdersGC;
		TRUNCATE TABLE #InvoicesGC;

		IF @@NESTLEVEL > 0
		BEGIN
			SET @NestedTransaction = 1;
		END
		ELSE
		BEGIN
			BEGIN TRANSACTION;
		END;

		INSERT INTO #OrdersGC
		SELECT TOP (@BatchSize)
			so.OrderID
		FROM Sales.Orders so
		WHERE
			so.OrderDate < 
				DATEADD(month, -50, GETUTCDATE());

		DELETE sol
		FROM #OrdersGC gc
		JOIN Sales.OrderLines sol
			ON sol.OrderID = gc.OrderID;
	
		INSERT INTO #InvoicesGC
		SELECT --TOP (@BatchSize)
			inv.InvoiceID
		FROM #OrdersGC gc
		JOIN Sales.Invoices inv
			ON inv.OrderID = gc.OrderID;


			DELETE ctr
			FROM #InvoicesGC inv
			JOIN Sales.CustomerTransactions ctr
				ON ctr.InvoiceID = inv.InvoiceID;

			DELETE sti
			FROM #InvoicesGC inv
			JOIN Warehouse.StockItemTransactions sti
				ON sti.InvoiceID = inv.InvoiceID;

			DELETE invl
			FROM #InvoicesGC inv
			JOIN sales.InvoiceLines invl
				ON invl.InvoiceID = inv.InvoiceID;

		DELETE inv
		FROM #OrdersGC gc
		JOIN Sales.Invoices inv
			ON inv.OrderID = gc.OrderID;

		DELETE so
		FROM #OrdersGC gc
		JOIN Sales.Orders so
			ON so.OrderID = gc.OrderID;
		
		IF @NestedTransaction = 0
		BEGIN
			COMMIT TRANSACTION;
		END;

	END;

GO

BEGIN TRANSACTION

EXEC Sales.Orders_GarbageCollection
	@BatchSize = 100,
	@Duration = 5;

ROLLBACK TRANSACTION
GO


--CREATE INDEX IX_Sales_Orders_OrderDate ON Sales.Orders (OrderDate);

This proc identifies 100 rows in Sales.Orders that we want to delete, and puts those OrderIDs in a temp table. We then delete records from related tables based on those OrderIDs, and we similarly delete records related to rows we want to delete from Sales.Invoices. And this procedure was written to loop and keep deleting records for a specified number of seconds.

One key thing to remember is that we only use a TOP clause when we populate the first temp table. We want to delete those 100 rows and everything related to them, not those 100 rows and only 100 related rows from the related tables. If we only delete some of the related rows, our process will fail on a foreign key violation when we try to delete some rows from the Sales.Orders table.

If you were to change this to use a SET ROWCOUNT, you would need to reset that to 0 after the statement inserting into the temp table. If not, the next statement looks like this:

With SET ROWCOUNT 100

Note the Top operator in this DELETE statement. All is flowing as expected, but then the Top operator shuts this down prematurely. Top only passes on 100 rows, when then get deleted. So, later when we try to DELETE from the Sales.Orders table, we fail because we haven’t deleted all the related rows.

And you could just SET ROWCOUNT 0 once you don’t need it, but I’d be concerned I’d forget to do so and cause issues.

Also, it’s deprecated

If you take a look at the remarks section of the documentation for this statement, there’s something interesting.

Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in a future release of SQL Server. Avoid using SET ROWCOUNT with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it. For a similar behavior, use the TOP syntax. For more information, see TOP (Transact-SQL).

Microsoft

So they aren’t using the word deprecated here, but it amounts to that. This will stop affecting DELETE, INSERT, and UPDATE statements in future, so you wouldn’t want to write a process using SET ROWCOUNT only to have it fail at some undefined point in the future.

It doesn’t state whether this will also affect MERGE statements, but I’d rather not have to use different syntax on some statements anyway, so I’ll just stick with my TOP clauses.

Happy Holidays

Should have another post or two coming in the next few weeks thanks to some extra time off and not traveling much for the holidays. Hope yours are great.

If you have any topics related to performance in SQL Server you would like to hear more about, please feel free to make a suggestion. You can follow me on twitter (@sqljared) and contact me if you have questions.

Also, subscribe to get emails when I post again.

Since Query Store stores statistics long term, we can use it to look for trends or major changes in performance. The Query Variation component of the QDS Toolbox helps us review changes and has a lot of options to allow us to select the kinds of changes we are interested in.

So, let’s review our options and go through some examples.

Options

Let’s discuss the options and how they interact.

  • @ServerIdentifier: Defaults to the current instance. Set this to gather data from another instance of SQL Server.
  • @DatabaseName: Defaults to the current database. I’m querying data from WideWorldImporters for my examples.
  • @ReportIndex: Default NULL. When used, stores information about the parameters used for the report.
  • @ReportTable: Default NULL. Allows you to store the report data in a table like dbo.QueryVariationStore, created by the installer.
  • @Measurement: What statistic are we basing our report on? CPU by default. [CLR, CPU, DOP, Duration, Log, LogicalIOReads, LogicalIOWrites, MaxMemory, PhysicalIOReads, Rowcount, TempDB]
  • @Metric: How are we comparing our measurements? Avg by default. [Avg, Max, Min, StdDev, Total]
  • @VariationType: Are we looking for queries that have improved [I] or regressed [R]. ‘R’ by default.
  • @ResultsRowCount: Number of rows to include in the report. Default is 25.
  • @RecentStartTime and @RecentEndTime: Defines what the ‘recent’ period is, which will be compared against the historical. The defaults are 1 hour ago for @RecentStartTime, and now for @RecentEndTime.
  • @HistoryStartTime and @HistoryEndTime: Defines what the ‘history’ period is, which will be compared against the recent. The defaults are 30 days ago for @HistoryStartTime, and 1 hour ago for @HistoryEndTime.
  • @MinExecCount: Only considers queries with at least this many recent executions. Default is 1.
  • @MinPlanCount and @MaxPlanCount: Only includes queries with a number of plans in this range. Defaults are 1 and 99999.
  • @IncludeQueryText: Includes the text for any identified queries in the output. Default is 0.
  • @ExcludeAdhoc: Ignores ad hoc queries, anything that isn’t part of a procedure or other defined object. Default is 0.
  • @ExcludeInternal: Excludes internal queries\operations run by SQL Server itself. Default is 0.
  • @VerboseMode: Default 0. Provides the queries being used in the messages tab.
  • @TestMode: Default 0. When enabled, does everything except actually run the generated queries to create the report.

So let’s look at a few examples Pablo Lozano put in the header for the procedure.

Worst Regressions

The first example using the Query Variation proc creates a list of the 25 queries with the largest regression in performance for the last hour, compared to the history over the last 30 days.

	EXECUTE [dbo].[QueryVariation]
		@DatabaseName	= 'WideWorldImporters',
		@Measurement	= 'Duration',
		@Metric			= 'Avg';

We’re relying a lot on default values here, but it shows how simple it can be to run this procedure and get usable results.

Our results include queries that are in procedures and those that aren’t. We have the QueryID values for each if we want to run another query to look them up, but we didn’t include the option to get the text for the statements.

Since the default is to look for regressed queries, our list has 25 queries in descending order based on how large the regression was. We can see the average duration recently and historically, the number of executions is each window, and the overall number of plans.

The first two queries on this list from Integration.GetCustomerUpdates both have regressed by almost 300%. Both also have two execution plans. That could be the explanation for their regression; we could have a new plan that is worse for a variety of reasons. We also are comparing 8 recent executions to 100 in our history, so it could just be the small sample size (or cold cache) is making a big difference.

We have the information we need here to find the query and follow up from there. I’m fond of using the “Tracked Queries” report in SSMS under Query Store, and that would show us the query and it’s recent plans.

Biggest CPU Improvements

Here’s the second example from the procedure’s comments.

		EXECUTE [dbo].[QueryVariation]
			@DatabaseName	= 'WideWorldImporters',
			--@ReportIndex		= '[dbo].[QueryVariationIndex]',
			--@ReportTable		= '[dbo].[QueryVariationReport]',
			@Measurement		= 'CPU',
			@Metric				= 'Total',
			@VariationType		= 'I',
			@ResultsRowCount	= 10,
			@MaxPlanCount		= 1,
			@IncludeQueryText	= 1;

I’ve added my database name and left the report tables in comments, and I did change the option to include query text.

This report should show us queries that have improved the most, based on total CPU usage. It will only give us 10 results, and only includes query with a single plan.

Results here are showing us procedure with the biggest reduction in overall CPU usage, and several are around 90%. But, that’s not surprising since we measured off the total numbers instead of average, and we’ve run the queries far less in the recent window. If we run the query 92% less in the recent window, this will show up as a reduction.

If our executions were more even, it could draw attention to statements that have improved despite the plan not changing, since we set our @MaxPlanCount to 1.

The QuerySQLText has information on variables at the beginning, but we see which statement is being referred to by each line of the report.

Query Variation Usage

The Query Variation report is good for a general comparison of performance. Run this daily, check your average duration\CPU\logical IO reads against the historical to look for regressions, and you get a good list of queries and procs you may want to look into.

Do keep in mind that you can’t use this check the results of a change to a query. If you are changing the text of a statement in any way, the modified statement is a different query_id in Query Store. So this report wouldn’t compare it to the original statement.

This can identify queries that have a change in performance because of a new plan, depending on how you set the @MaxPlanCount. So if you are changing an index to improve a query, a query variation report could show the statement’s improvement (or regression).

Speaking at SQL Friday this week

If you are not aware of SQL Friday, there’s a talk done online live every week at noon CET (GMT +6). You can also view previous sessions through links here.

I’ll be speaking this week with a case study on optimizing a very frequently run procedure. It is the same subject I presented at Virtual PASS 2021.

I have a few more blogs coming on the QDS Dashboard, and a few other topics I’m likely to write on soon. If you have any topics related to performance in SQL Server you would like to hear more about, please feel free to make a suggestion. You can follow me on twitter (@sqljared) and contact me if you have questions.

In contrast to the previous QDS Toolbox components, this one is relatively simple. Instead of a tool that’s ready to be scheduled to give you a report, this just adds the dbo.query_store_wait_stats_pivoted view.

So, what’s the value here?

The typical case

The way I use Query Store most often, is to investigate the performance of a specific procedure or statement. I tend to do this when I know there is an issue, or when I want to evaluate the improvement of a new change.

Here’s a query I’ll often run (or some variation thereof):

--Statement numbers for the last two hours, no aggregation
SELECT 
	qrs.avg_duration * qrs.count_executions AS total_duration,
	qrs.count_executions, 
	qrs.avg_duration, 
	qsq.query_id,
	qsi.end_time,
	qt.query_sql_text, 
	qsq.query_hash
FROM sys.query_store_query qsq 
JOIN sys.query_store_query_text qt 
	ON qt.query_text_id = qsq.query_text_id
JOIN sys.query_store_plan qp 
	ON qp.query_id = qsq.query_id
JOIN sys.query_store_runtime_stats qrs 
	ON qp.plan_id = qrs.plan_id
JOIN sys.query_store_runtime_stats_interval qsi
	ON qsi.runtime_stats_interval_id = qrs.runtime_stats_interval_id
WHERE 
	qsq.object_id = OBJECT_ID('Integration.GetStockItemUpdates')
	AND qsi.end_time > DATEADD(hour, -2, GETUTCDATE())
	--AND qt.query_sql_text LIKE '%something%'
ORDER BY 
	qt.query_sql_text, 
	qsq.query_id,
	qsi.end_time

Now, Query Store’s performance statistics are separated into intervals. We can set the size for an interval when we turn on Query Store (60 minutes by default, I have it at 15 on my test environment). So, this query will show me the performance of each statement for each interval over the last two hours.

Sometimes I’ll aggregate the data so I can look at the averages over several days, but sometimes I like seeing each interval to see if there is a lot of variation.

So, here’s that result set:

Runtime Stats from Query Store

sys.query_store_wait_stats

But let’s say we look at this data and are concerned one of these statements is taking longer than expected, and we want to see if it is waiting on anything. So, we take that query and add a join to sys.query_store_wait_stats. Seems simple enough.

--Statement numbers for the last two hours, no aggregation, with wait stats for one statement
SELECT 
	qrs.avg_duration * qrs.count_executions AS total_duration,
	qrs.count_executions, 
	qrs.avg_duration, 
	qsq.query_id,
	qsi.end_time,
	qt.query_sql_text, 
	ws.wait_category_desc,
	ws.execution_type_desc,
	ws.total_query_wait_time_ms,
	ws.avg_query_wait_time_ms
FROM sys.query_store_query qsq 
JOIN sys.query_store_query_text qt 
	ON qt.query_text_id = qsq.query_text_id
JOIN sys.query_store_plan qp 
	ON qp.query_id = qsq.query_id
JOIN sys.query_store_runtime_stats qrs 
	ON qp.plan_id = qrs.plan_id
JOIN sys.query_store_runtime_stats_interval qsi
	ON qsi.runtime_stats_interval_id = qrs.runtime_stats_interval_id
LEFT JOIN sys.query_store_wait_stats ws
	ON ws.plan_id = qrs.plan_id
	AND ws.runtime_stats_interval_id = qsi.runtime_stats_interval_id
WHERE 
	qsq.object_id = OBJECT_ID('Integration.GetStockItemUpdates')
	AND qsi.end_time > DATEADD(day, -2, GETUTCDATE())
ORDER BY 
	qt.query_sql_text, 
	qsq.query_id,
	qsi.end_time,
	ws.wait_category_desc

And let’s see our results:

Runtime Stats with Wait Stats

Now we have 9 rows instead of 5. And the number under avg_duration look redundant.

What’s happened, is that sys.query_store_wait_stats has multiplied our result set. Since it stores a record for each wait type observed in each interval, we have multiple combined rows for several of these intervals. And one of our rows would have been removed from this set if we didn’t LEFT JOIN to sys.query_store_wait_stats, since it had no waits.

This is probably not how you would image it to work on first glance. And you might want to pivot those results instead, but the Pivoted Wait Stats view has done that for you.

Using Pivoted Wait Stats

If I take my original query and join to dbo.query_store_wait_stats_pivoted, I will get one row of output for each plan_id and runtime_stats_interval_id.

And the result set will be rather wide. There’s a total and average column for each wait type in sys.query_store_wait_stats. You may or may not want to include all in your query.

So, here’s an example with the average for several interesting wait types:

--Statement numbers for the last two hours, no aggregation
SELECT 
	qrs.avg_duration * qrs.count_executions AS total_duration,
	qrs.count_executions, 
	qrs.avg_duration, 
	qsq.query_id,
	qsi.end_time,
	qt.query_sql_text, 
	wsp.Average_BufferIO,
	wsp.Average_BufferLatch,
	wsp.Average_CPU,
	wsp.Average_Lock,
	wsp.Average_Memory,
	wsp.Average_NetworkIO,
	wsp.Average_Parallelism,
	wsp.Average_Tracing
FROM sys.query_store_query qsq 
JOIN sys.query_store_query_text qt 
	ON qt.query_text_id = qsq.query_text_id
JOIN sys.query_store_plan qp 
	ON qp.query_id = qsq.query_id
JOIN sys.query_store_runtime_stats qrs 
	ON qp.plan_id = qrs.plan_id
JOIN sys.query_store_runtime_stats_interval qsi
	ON qsi.runtime_stats_interval_id = qrs.runtime_stats_interval_id
LEFT JOIN dbo.query_store_wait_stats_pivoted wsp 
	ON wsp.plan_id = qp.plan_id
	AND wsp.runtime_stats_interval_id = qrs.runtime_stats_interval_id
WHERE 
	qsq.object_id = OBJECT_ID('Integration.GetSaleUpdates')
	AND qsi.end_time > DATEADD(day, -5, GETUTCDATE())
ORDER BY 
	qt.query_sql_text, 
	qsq.query_id,
	qsi.end_time

Runtime Stats with Pivoted Wait Stats

We have one record for this query for each of three intervals. In each row, I can see the average wait time for CPU, disk (BufferIO), buffer latch, and other waits. In this case, the largest waits were with NetworkIO. Likely, SSMS was having trouble keeping up with displaying multiple result sets, because I ran this batch 50 times in a row.

One last thing, keep in mind the times from the main tables in Query Store like sys.query_store_runtime_stats are reported in microseconds, while the wait tables are in milliseconds whether you use sys.query_store_wait_stats or dbo.query_store_wait_stats_pivoted. A query that takes 7,000,000 microseconds to run but waited for CPU time for 6,900 milliseconds spent nearly its entire duration waiting.

Don’t forget about PASS Data Community Summit 2021!

PASS Data Community Summit 2021 is next week, and the event is virtual and free, so please go and register if you haven’t already. I’m happy to see PASS and the summit are Red Gate’s hands, and expect it to be a great event.

Many of the sessions, including my session, are prerecorded and can be watched at any time. This should make it easier to get around conflicts between sessions you want to hear. There are also Q&A sessions for the prerecorded sessions, and those are scheduled and live. So get registered and make sure you don’t miss out.

If you have any topics related to performance in SQL Server you would like to hear more about, please feel free to make a suggestion. You can follow me on twitter (@sqljared) and contact me if you have questions.