Halloween Problem Examples for Updates

Posted on 02/20/202303/12/2023 by sqljared

Update and Correction:

This blog was originally posted on February 20. Since then I read other articles that suggested different behavior with the Halloween Problem. I contacted Paul White, who informed me that the WideWorldImporters database uses compatibility level 130 (SQL Server 2016) by default. So, I tested on a SQL Server 2019 instance but was probably seeing an issue addressed in later updates.

I tested again at compatibility level 150 and saw a different execution plan which led to different conclusions.

I’ve left the majority of the post unchanged, but I’m adding an addendum section, and updating the summary and its conclusions. So, make sure you read those sections for the corrections.

Original Post:

I find myself talking about the Halloween Problem a lot and wanted to fill in some more details on the subject. In short, the Halloween Problem is a case where an INSERT\UPDATE\DELETE\MERGE operates on a row more than once, or tries to and fails. In the first recorded case, an UPDATE changed multiple rows in the table more than once.

So let’s take a look at an example using a publicly available database, WideWorldImporters.

A Halloween Problem example

Here’s a simple update procedure. We’re going to update the quantity for an item in the Sales.OrderLines table:

CREATE OR ALTER PROCEDURE Sales.OrderLines_UpdateQuantity
	@OrderID INT,
	@StockItemID INT,
	@Quantity INT
	WITH EXECUTE AS OWNER
AS
BEGIN
	SET NOCOUNT ON;
	SET XACT_ABORT ON;

	UPDATE sol
	SET
		sol.Quantity = @Quantity,
		sol.PickedQuantity = @Quantity
	FROM Sales.OrderLines sol
	WHERE
		sol.OrderID = @OrderID
		AND sol.StockItemID = @StockItemID;
		-- AND sol.Quantity <> @Quantity;
END;
GO

You may notice the commented line. In one description of the Halloween Problem I heard\read, it was suggested that if we try to SET something that is in our WHERE clause the problem is likely to occur. Or rather, SQL Server will see the possibility of the problem and add protections to our execution plan to prevent it.

First, let’s test without that line, and see what our execution plan tells us.

EXEC Sales.OrderLines_UpdateQuantity
	@OrderID = 5,
	@StockItemID = 155,
	@Quantity = 21;
GO

The eager spool between our index reads and clustered index update shows that SQL Server added Halloween protections to prevent the problem. The problem is prevented by separating the read phase of the query from the write phase.

This usually involves a blocking operator. Most often this is an eager spool, but if there is another blocking operator in the plan like a sort or hash match, that blocking operator may remove the need for a separate spool.

The Halloween Problem would occur if a query is running in row mode and as rows are still being read, rows are being updated and moved in an index. This allows the read operation to potentially read the updated row again and operate on it again. The index movement is key in this scenario.

But with a blocking operator between the read operation and the write, we force all the reads to complete first. This gives us a complete, distinct list of rows to be updated (in this example) before we get to the clustered index update, so it isn’t possible to update the same row twice.

So, how does index movement come into play here? We are updating the Quantity and PickedQuantity columns in our UPDATE statement. Both fields are key columns in the only columnstore index on the table, NCCX_Sales_OrderLines.

CREATE NONCLUSTERED COLUMNSTORE INDEX [NCCX_Sales_OrderLines] ON [Sales].[OrderLines]
(
[OrderID],
[StockItemID],
[Description],
[Quantity],
[UnitPrice],
[PickedQuantity]
)WITH (DROP_EXISTING = OFF, COMPRESSION_DELAY = 0) ON [USERDATA];
GO

So when we update these columns, the affected rows will move in that index. If the row moves, that means a read operation could continue reading and find the same row again, returning it as a part of its result set a second time.

Interestingly, we aren’t reading from the columnstore index in the plan provided. Since that’s the only index containing these columns as key values, it’s the only index where the rows should move. In this case, our read operators shouldn’t encounter updated rows a second time, since they use the FK_Sales_OrderLines_OrderID (with a key lookup against PK_Sales_OrderLines).

I wonder if SQL Server decided the Halloween protections were needed before it decided which index it would use for the read.

Removing the index

Either way, if we dropped the NCCX_Sales_OrderLines index, we should see a plan without an eager spool between the read operators and the update operator.

IF EXISTS(
	SELECT 1
	FROM sys.indexes si
	WHERE
		si.name = 'NCCX_Sales_OrderLines'
)
BEGIN
	DROP INDEX [NCCX_Sales_OrderLines]
		ON Sales.OrderLines;
END;
GO

With the index removed, let’s look at the new plan.

We’ve lost the extra steps to the left of the clustered index update operator to update the columnstore index, and we have also lost the eager spool between the read operators and the update operator. This shows without the index movement, Halloween protections are no longer needed.

Performance impact of protections

Let’s look at the data from Query Store to see how big the difference is between the two execution plans.

I ran a simple query against the same OrderID in Sales.OrderLines before running the procedure before and after the index change to get the data into the cache (because cold cache issues were making a large difference). I also ran the procedure 10 times to try to average out our results in case any odd wait types were seen.

80 microseconds versus 46 microseconds. Blazing fast in both cases with the data already cached, but the plan with Halloween protections took 74% longer. Unsure if the update to a columnstore index is significantly more expensive than that of a rowstore index. Perhaps we should test this again without columnstore complicating the issue.

Speaking in general, I would expect a bigger difference in a query affecting more rows. For a query that only returns 3 rows from the first index seek, the delay caused by the spool would be very small. But imagine if we have a query that reads tens or hundreds of thousands of rows before performing its write operation.

Normally such a query would be passing rows it has read up to the join and update operators while it is continuing to read. Those operations would be happening on different threads in parallel.¹

If we are being protected from the Halloween Problem, the eager spool will not return any rows to the operations above it (like the clustered index update) until all rows have been read. So the writes cannot start until much later, and the more rows being read the more considerable the delay.

Nonclustered indexes?

If you noticed the “+3 non-clustered indexes” banner in one of the plans above, that’s indicating the nonclustered indexes updated when we updated the clustered index. This is more obvious in Plan Explorer than in the plans as shown in SQL Server Management Studio. So, I wanted to point that out in case the visual was confusing to anyone.

But this raises another question. If we are updating those indexes, why don’t they cause the Halloween protections to be used?

That is because the quantity columns are present in those indexes only as included columns. Changes to those columns won’t affect where the row sorts, but the values still need to be updated.

Rowstore testing

So, let’s see how this looks with a rowstore index. Here’s a second procedure, similar to the first but also updating PickingCompletedWhen.

CREATE OR ALTER PROCEDURE Sales.OrderLines_UpdateQuantityWhen
	@OrderID INT,
	@StockItemID INT,
	@Quantity INT
	WITH EXECUTE AS OWNER
AS
BEGIN
	SET NOCOUNT ON;
	SET XACT_ABORT ON;

	UPDATE sol
	SET
		sol.Quantity = @Quantity,
		sol.PickedQuantity = @Quantity,
		sol.PickingCompletedWhen = GETUTCDATE()
	FROM Sales.OrderLines sol
	WHERE
		sol.OrderID = @OrderID
		AND sol.StockItemID = @StockItemID
		AND sol.PickingCompletedWhen < GETUTCDATE();
END;
GO

Initially, no index uses PickingCompletedWhen. So if we execute the procedure as is, we shouldn’t see the tell-tale eager spool.

EXEC Sales.OrderLines_UpdateQuantityWhen
	@OrderID = 5,
	@StockItemID = 155,
	@Quantity = 21;
GO

This plans is what we’d expect. If we add an index, how does this change the plan and how does this change the performance?

IF NOT EXISTS(
	SELECT 1
	FROM sys.indexes si
	WHERE
		si.name = 'IX_OrderLines_OrderID_StockItemID_PickingCompletedWhen'
)
BEGIN
	CREATE INDEX IX_OrderLines_OrderID_StockItemID_PickingCompletedWhen
		ON Sales.OrderLines (OrderID, StockItemID, PickingCompletedWhen);
END;
GO

Here, we see the eager spool implementing the Halloween protections again, but between the index seek and the key lookup. Note that the new index is the one we are using for the index seek. The clustered index update now indicates it is updating 4 nonclustered indexes, including the new index.

So, is the performance difference as stark as it was with the columnstore index?

So, 52 µs vs 118 µs. The query took about ~126% longer when the Halloween protections were present. More than we saw with the columnstore index, which is surprising. Perhaps it is relevant that we are updating a third field. It almost feels like the observer effect at this scale.

Addendum

So, to correct things here, let’s go back to the first procedure.

CREATE OR ALTER PROCEDURE Sales.OrderLines_UpdateQuantity
	@OrderID INT,
	@StockItemID INT,
	@Quantity INT
	WITH EXECUTE AS OWNER
AS
BEGIN
	SET NOCOUNT ON;
	SET XACT_ABORT ON;

	UPDATE sol
	SET
		sol.Quantity = @Quantity,
		sol.PickedQuantity = @Quantity
	FROM Sales.OrderLines sol
	WHERE
		sol.OrderID = @OrderID
		AND sol.StockItemID = @StockItemID;
		-- AND sol.Quantity <> @Quantity;
END;
GO

If I run this procedure again with the restored database and no other changes besides updating the compatibility level to 150, I see the following execution plan:

So, we have no eager spool, which means the Halloween Problem isn’t a problem now.

Previously, there was a spool between the index seek and lookup and the clustered index update. The only index using any of the updated fields as a key value was the columnstore index. This suggested that the optimizer will use Halloween protections if any index uses the updated fields as a key value because the rows would be moved in that index.

This new plan disproves that because the optimizer no longer uses the protections with the later compatibility level. And the columnstore index (NCCX_Sales_OrderLines) is still present (as you can see if you hover over the clustered index update operator).

As for the second procedure, I see the Halloween protections even without the index I added in my example. Without that index, the query originally used the FK_Sales_OrderLines_OrderID index to seek the rows in question. At the higher compatibility level, the IX_Sales_OrderLines_Perf_20160301_02 index is used, which is keyed on (StockItemID, PickingCompletedWhen).

So, the Halloween protections are used because we read from an index keyed on one of the updated fields, and rows being updated will potentially move in that index.

We’ve seen the Halloween protections when using nonclustered indexes so far, but what if we are using the clustered index for the read?

I wrote a quick procedure to change the OrderLineID, which is the only column in the clustered primary key for this table. And this matches expectations; we see the eager spool between the clustered index seek and the update operator.

Summary

Hopefully, the addendum corrects the matter while keeping things clear. I’m updating one of the bullet points below, as well.

It seems there are only two criteria for the protections against the Halloween Problem to be used for an UPDATE query:

The object being updated must also be in the query.
~~A column being updated must be a key column in at least one index on the table.~~ One of the updated columns must be a key value in the index used for the read portion of the query, so that the rows may move in that index.

For other statements, the setup is more complex. I find the UPDATE statement is the most straightforward example of the Halloween Problem. But you can see the protections in place if you query from a table as part of an INSERT or DELETE (or MERGE) where you change that same table.

And if we see Halloween protections in the plan for a query, we could change the offending index or the query to change the behavior.

Or we could use the manual Halloween technique, which I will discuss next time.

Thanks again to Paul White for pointing out the compatibility level; I doubt that would ever have occurred to me.

Please contact me if you have any questions or comments. I’ve updated my social media links above to include Counter.Social and Mastodon. We’ll see if there is more #sqlfamily activity on those platforms going forward.

Footnotes

1: Not the type of parallelism we typically think of with SQL Server. Parallelism is typically when a given operation, like an index scan, is expected to process many rows, and SQL Server dedicates multiple threads to that operator or group of operators. In this case, I say parallel because different operators (the index seek, nested loops join, and clustered index update) are all processing rows at the same time, one row at a time.

Tempdb Contention in 2023

Posted on 01/25/202301/21/2024 by sqljared

Tempdb contention has long been an issue in SQL Server, and there are many blogs on the issue already. But I wanted to add one more mainly to highlight the improvements in recent versions of SQL Server

Tempdb contention is most often discussed in as relating to the creation of temp tables (and other objects) in tempdb. If you are experiencing this you will see PAGELATCH_EX or PAGELATCH_SH waits, frequently with wait resources like 2:1:1 or 2:1:3. This indicates contention in database 2 (tempdb), page 1 (the first data file in tempdb), and one of the PFS, GAM, or SGAM pages (which are pages 1, 2, and 3 respectively). Tempdb files of sufficient size will have additional PFS, GAM, and SGAM pages at higher page numbers, but 1 and 3 are the pages most often referenced.

Temp tables aren’t the only objects being created in tempdb. Table variables are as well unless they are memory-optimized. There are also worktables for sorts, spools, and cursors. Hash operations can spill to disk and are written into tempdb. Row versions are written into tempdb for things like read committed snapshot isolation, and triggers make use of row versioning as well. For more details, check out this excellent post by David Pless.

Before recent releases, there were three main suggestions for reducing tempdb contention.

Trace flags (1118 and 1117)
More tempdb files
Create fewer objects in tempdb

Honestly, I don’t think the third was even included in a lot of the blogs on the subject, and it is very important. Many of the actions that use tempdb can’t be avoided, but I tend to use memory-optimized table variables instead of temp tables the vast majority of the time.

In one case a few years ago, I replaced the memory-optimized table variables in one very frequently executed stored procedure with temp tables to see if using temp tables would result in better execution plans. This procedure was executed about 300 million times per day across several SQL Server instances using similar databases, and the procedure used 4 temp tables. The plans didn’t matter; creating 1.2 billion more temp tables per day added far too much tempdb contention.

But the main point of this post is to help everyone catch up on the topic, and see how more recent versions of SQL Server improve on this issue.

Improvements in SQL Server 2016

SQL Server 2016 introduced several improvements that help reduce tempdb contention.

The most obvious is that setup will create multiple files by default, one per logical server up to eight. That bakes in one of the main recommendations for reducing tempdb contention, so it’s a welcome improvement.

There are also behavior changes that include the behavior of trace flags 1117 and 1118. All tempdb data files grow at the same time and by the same amount by default, which removes the need for trace flag 1117. And all tempdb allocations use uniform extents instead of mixed extents, removing the need for trace flag 1118.

So, that’s another recommendation for reducing tempdb contention already in place.

Several other changes also improve caching (reducing page latch and metadata contention), reduce the logging for tempdb operation, and reduce the usage of update locks.
For the full list, check here.

Improvements in SQL Server 2019

The big change here is the introduction of memory-optimized tempdb metadata. The documentation here says that this change (which is not enabled by default, you will need to run an ALTER SERVER CONFIGURATION statement and restart) “effectively removes” the bottleneck from tempdb metadata contention.

However, this post by Marisa Mathews indicates the memory-optimized tempdb metadata improvement in SQL Server 2019 removed most contention in PFS pages caused by concurrent updates. This is done by allowing the updates to occur with a shared latch (see the “Concurrent PFS updates” entry here).

Tempdb contention seen in sp_WhoIsActive output

One thing I would point out is that the metadata is being optimized here; the temp tables you create are not memory-optimized and will still be written to the storage under tempdb as usual.

Improvements in SQL Server 2022

The post above also indicates that SQL Server 2022 reduces contention in the GAM and SGAM pages by allowing these pages to be updated with a shared lock rather than an update log.

The issue with the PFS, GAM, and SGAM pages has always been the need for an exclusive latch on those pages when an allocation takes place. If 20 threads are trying to create a temp table, 19 of them get to wait. The suggestion to add more data files to tempdb was a way to get around access to these pages being serialized; adding more files gives you more of these pages to spread the allocation operations across.

In Summary

The gist is that tempdb contention has been nearly eliminated in SQL Server 2022. There are still several other actions that use tempdb, and you may see contention if have a niche workload or use a lot of worktables.

Hopefully, this post will help you decide if it’s time for an upgrade. If you have been seeing tempdb contention on these common pages, the latest release should be a major improvement.

Feel free to contact me with any questions or let me know of any suggestions you may have for a post.

Introversion and PASS Summit

Posted on 01/01/202301/06/2023 by sqljared

This will be an unusual post for me. I normally just write technical posts about performance issues I’ve been working on. But I did go to PASS Summit this year, for only the second time in person, and the social aspect was a big part of why I went.

I’m writing this largely because of a blog post by Paul Randal. I replied in the comments but wanted to relate more of my own experience with introversion and PASS Summit.

So, networking…

I am an introvert, and I’ve never felt networking was my strong suit. Long conversations with people I don’t know aren’t something I crave, but networking can drastically increase our options career-wise. That’s how I got my current job.

Several years back I was working on a contract that had just been extended. The project I was working on had been completed, and I wasn’t sure I wanted to remain with that company long-term. I had been to meetings with the local user group over the years and spoken a few times, but hadn’t been to a meeting in months. I decided to check back in and be sociable.

So, I went to the next meeting and during the intro, the leader, Kevin Feasel, said something interesting. His company was looking for a DBE, and to speak with him after if anyone was interested. That conversation led to me applying, and one interview later I had an offer.

It’s honestly ridiculous looking back on it. It feels like I put in a minimal amount of effort but got a great opportunity with a very unique company. This was about a year after leaving Microsoft CSS (customer support), which allowed me to see a huge variety of SQL Server issues. ChannelAdvisor has a very busy OLTP environment and had great database engineers already working for it. Working on the performance issues here has been enlightening.

So, when I submitted a talk for PASS Summit this year, I had to choose whether to apply for an in-person or online session. Since I’ve been hiding out from the coronavirus like many people for the last two years, I decided to try for an in-person session at the summit and spend the extra effort to do some networking while I was there.

At the summit

I’ve been to the PASS Summit once before in Seattle. There are a ton of great sessions, and I usually want to be in three different rooms for each session. I went to sessions all day, walked around Seattle a bit each evening, and crashed. It’s surprisingly exhausting to learn all day.

During this trip, I wanted to focus more on networking. I still went to a lot of sessions, but there were some I skipped to have conversations in the exhibitor lounge and elsewhere in the convention center. And if I was in a session that wasn’t what I was looking for, I was more likely to leave and just look for a chance to talk.

I ended up talking with about a dozen well-known people in the community. Some I had already interacted with; others not at all.

And it wasn’t comfortable or easy most of the time. As I said earlier, I’m an introvert. I had to make myself engage with people I had little or no contact with before. Several times I walked past a conversation before I made myself turn around and get involved.

I was mainly interested in talking with engineers doing consulting work about the work they do. There are similarities to the work I did in Microsoft CSS previously, so I’ve been interested in that part of the business.

I had some contact with a few speakers through work. I had replied to a few on a topic or two on Twitter. A few I had spoken with because I presented for a user group they were part of. I emailed Paul Randal several times through the mentorship he offered a year or two ago. Some like Brent Ozar I had never had any contact with, but he was more than willing to talk. I also had a few good conversations with consultants in the sponsor’s lounge.

What surprised me was hearing a lot (or all?) of the presenters I talked with mention that they were introverts during their talks. It seems unbelievable for someone presenting to a ballroom or keynote with seeming ease.

My perspective

I referenced Paul Randal’s post because part of the feedback he received was from people who found the experience alienating. Some didn’t feel the community was very welcoming or felt like they were on the outside looking in or saw it as being very cliquish.

I wanted to post not to negate any of that; our feelings are always valid. But that doesn’t mean others are trying to be exclusionary.

And I did feel like I was on the outside looking in. And it was difficult\stressful to start up a conversation with people or join in one that was already going on. I’ve spoken at PASS Summit or Virtual PASS three times, but I’m not nearly as well known as many of the other presenters. So I certainly don’t feel I’m part of any clique.

But it did strike me that so many of the presenters had mentioned that they were introverts. So if people see the more well-known speakers gravitate toward people they already know, that’s what I’d expect an introvert to do.

I certainly did the same when I was feeling a bit overwhelmed and saw a friendly face.

In all of the conversations I joined in with or started, I didn’t have anyone turn me away. Once I engaged, I found everyone to be willing to answer questions, happy to give advice and giving of their time.

So as much as anything, I’d like to encourage people to be bold and engage. It may not be easy to start these contacts, but the results have been well worth the discomfort.

Stepping stones

One thing that did make this easier was something I didn’t intend. Over the last few years, I’ve presented at PASS Summit, several SQL Saturday events, and several user groups as well. So I’ve had more interactions with people in our community over the last few years as a lead-up to the summit this year.

I didn’t present at those user groups and SQL Saturday events to make networking easier, but it does help when I can say to someone, “Hi, I presented online for your user group a few months ago, and I wanted to ask you a question.”

This is not to say that you have to present to be part of the community, but any way you can engage with the community gives you more connections and more opportunities going forward.

And I should specifically thank Tracy Boggiano because many of those speaking opportunities came from her pinging me when she heard of a user group that needs a speaker. So, thank you, Tracy!

About the community…

Before and after the summit, I heard a lot of people in the community express a desire to leave Twitter. And I get why, but it’s been the main place where #sqlfamily posted. It isn’t clear how much of the community is looking to move, but I thought I’d mention the options I’ve seen people refer to:

CounterSocial: I saw people start talking about this before the summit. I haven’t seen a ton of activity here yet, but there are several names I recognized that have joined. Perhaps the chatter will pick up after the holidays, so try looking for #sqlfamily there.
Mastodon: I heard more about Mastodon at and after the summit. This works a bit differently, not being entirely centralized. People have to set up a server for it, and it seems communities have to find the right server (don’t ask me for the details). But you can search for #sqlfamily there, or go to the server that Daniel Hutmacher set up at https://dataplatform.social/home. Not a lot of activity there yet, but maybe this is where things will reform.
Linkedin: This wouldn’t have occurred to me, but Brent Ozar pointed to LinkedIn (and TikTok and Instagram) when he said he would stop tweeting. There’s some activity here, and I’ve linked my new blog posts here in the past.

So, if you are looking for people to follow or new events or groups to be part of here, there are three good places to start.

My LinkedIn should already be at the top of the page, but I have accounts on CounterSocial (https://counter.social/@sqljared) and Mastodon (https://dataplatform.social/@sqljared), and I’ll link those at the top of the page shortly.

In Summary

I hope I’ve encouraged you to engage with the community in the new year, and I hope you will find it rewarding. I’m going to do the same myself and see if there are some more user groups that need a remote present.

Happy New Year!

Update and Correction:

Original Post:

A Halloween Problem example

Removing the index

Performance impact of protections

Nonclustered indexes?

Rowstore testing

Addendum

Summary

Footnotes

Improvements in SQL Server 2016

Improvements in SQL Server 2019

Improvements in SQL Server 2022

In Summary

Other links

So, networking…

At the summit

My perspective

Stepping stones

About the community…

In Summary