Pages

Saturday, February 27, 2010

Are you a SQL Server Guru?

At the moment I have one contract as a SQL Server consultant for a company that develops software, and – guess what – uses SQL Server 2008 as their DBMS. About one month ago lead of development asked me to conduct one part of an interview with an applicant, to find out his real SQL Server skills. When looking at the application I was truly amazed, because the candidate designated himself the title of a SQL Server Guru. I know some people who are experts regarding SQL Server, even I know something more than only the fundamentals, but I seriously think there’s only one SQL Server Guru alive :). (You know, who I’m talking about without following the link, don’t you?) I was very excited about how the interview would go. As it turned out, it went somewhat surprising, and so I developed the idea for this blog post.

But before I start repeating parts of the interview, I’d like to provide some background information about the company’s business model, just to get you a little more in context. The corporation not only builds software, but delivers complete systems to smaller stores, consisting of hardware and software, where the hardware usually encompasses one server and some client computers that are primarily used as checkout-counters. Of course the sales price has to be as low as possible, so there’s a very limited budget for building the system. Due to this limit, also the server hardware is quite imperfect, and SQL Server operates on the edge every so often, making performance and optimization a big issue. Actually I spend the majority of my time with performance optimization.

Ok, here are some excerpts from the interview. I’ve included my thoughts and annotations (written in italic and embedded in brackets).

The interview

Q: This is the first time, I have the opportunity to interview a SQL Server Guru. So, what kind of guru are you actually? Is it more Development or Administration? Are you familiar with MSFT’s BI stack?

A: I’m very good in TSQL and optimization and I also know Reporting Services.

Q: Very good, you know something about optimization! We really need reinforcement in that area. So, what’s your general approach to optimization? Do you follow a method or framework here?

A: In most cases I optimize TSQL.
(I loved to hear that. Really, I did. Tuning the database design, TSQL, and application requests is the most promising approach. The majority of people I’ve asked the same question so far were instantly telling me something about simply adding more indexes.)

Q: Very good! I really like that. How do you usually optimize TSQL? Is there a general approach, you follow?

A: I use query hints.
(Ugh! How surprising. Query hints as general optimizing approach! I had to find out more.)

Q: Query hints? Why is that?

A: Because the optimizer is very stupid and generates wrong plans most of the time.
(Hmm. About 20 years or so of development effort to make the optimizer perfect, all wasted. Does he really believe this?)

Q: That’s amazing. I didn’t know about that so far! Can you provide an example please?

A: Yes, for instance, the optimizer very often decides for using merge joins, and merge joins are always slow, particularly slower than nested loop joins. I have to force the optimizer using nested loop joins instead.
(If merge joins are always slow and outperformed by nested loop joins, why do we have those? Has he never thought about that?)

Q: So, that’s your approach to optimization? Query hints?

A: Yes, whenever I write a query, I develop the execution plan in my head, because I can do better than the optimizer. And then I provide as many query hints as required to force the optimizer in the direction of using this plan, my plan.
(I confess that I was tempted to ask for the company’s name where he implemented this, since I had a strong feeling this company would need some support. But at the moment I was fully booked and so I resisted. I recognized, this discussion would need to nowhere and decided it was time for a topic change.)

Q: Ok, thank you. So what did the guru do last week?

A: I used clustered indexes because this minimized locks.
(Hmm. Another interesting statement. Let’s see, if he can tell me more about this.)

Q: Clustered index? What is that? Can you explain it, please?

A: When a query uses a clustered index, the query is very fast.

Q: So, a clustered index is kind of magic? How come, a query is always fast, when a clustered index is used? Can you tell me some more about a clustered index please? How is it constructed, or structured.

A: A clustered index is ordered.
(Wow! The guru enlightened me with that perception!)

Q: So does this mean, a non-clustered index is non-ordered?

A: No no, a non-clustered index is also ordered.

Q: Ok, but what’s the difference then between a clustered index and a non-clustered index? What exactly is a clustered index?
(I was about loosing my patience at this point!)

A: A clustered index minimizes blocking.

Q: Really? How is this possible? How is a clustered index structured? Can you explain, please?

A: You know, I use clustered indexes every day. I’m so deeply involved with clustered indexes that I can’t explain it anymore.
(That was the moment, when I decided to send him back to where he came from. But unfortunately everybody else, including my interview pals were sitting in a strategic planning meeting, so I was the only one available and had to continue. I started getting angry, since he did nothing more than stealing my time.)

Q: Can you explain how a non-clustered index seek on a table works, if also a clustered index is existing?

I just asked this question to tease him a little. Of course he couldn’t explain this. How could he, if he not even doesn’t know what is clustered index is at all?

After a while our chief architect appeared, took over, and rescued me. Another 30 minutes later the candidate was sent home.

Ok, so what is this blog post all about?

About one year ago, Brent Ozar has written a very informative and entertaining series of blog posts, containing some interview tips which you may find here. I’d add another simple tip to his advices: Be honest. Only include those skills in your CV, you really can cope with. Know what you know but, more important, also know what you don’t know! While tuning your SQL Server is a good thing to do, tuning your CV too much is not! Be sure, someone will find out and you’d be pegged as being dishonest. You’ll be out of the game immediately when this happens.

But how come, people consider themselves as being SQL Server experts or even gurus? I think that’s because MSFT did a great job in making SQL Server an easy to use product. After you managed to install your server and client tools, you can start using the relational engine instantly. It’s very easy staring at zero and improving your skill to, lets say, 30% in a short period. So, should we blame MSFT for making SQL Server that simple to use in taking the first steps? Of course not. But it’s just reality that many people using SQL Server don’t know much more than only some fundamentals. That’s not a bad thing at all, because SQL Server will do the job, even if it’s not operated at its optimal configuration, and non ideal queries are fired at it. To be honest, I personally like this, since it gives me the opportunity of being called for rescue, when things go worse.

As with many things, also the term expert is relative. You can be that 30% expert in a particular environment, where you know only something about SQL Server, but everybody else knows almost nothing, making you the one-eyed amongst all the other blind persons.

I remember a presentation Kalen Delaney was holding in my hometown on the occasion of SQL Server 2005 inauguration, where she very frankly confessed that she has to face the fact, her SQL Server knowledge will decrease with every new release of SQL Server, simply because the product has an ever increasing complexity, and her knowledge will not able of keeping pace with this complexity in all fields. Therefore, she decided to focus on some things, and to be an expert on these.

You may compare this to the disappearance of Middle Ages’ universal scholars. Also universal SQL Server experts are certainly rare, or even non-existent. I’m pretty sure, even the guru himself would be somewhat conservative calling himself a guru.

What do you think?

Sunday, February 21, 2010

When an Index Seek actually is a Scan

I came across this issue recently when we discovered some very resource and time consuming seeks in some of our lately rolled out user databases. The users were querying articles from an article table by name.

Here’s the simplified table that I’m talking about:

use tempdb
go

if
(object_id('article', 'U') is not null)
  drop table article
go
create
table article
 (
  article_id int identity(1,1) primary key not null
 ,catalog_name nvarchar(80) not null
 ,filler nchar(250) not null default '#'
 
)
go

Let’s pretend there are 400000 distinct articles (that’s approximately the real number, we have). The following scrip will insert 400000 rows:

-- Insert 400000 rows.
-- We simulate random article names.
insert article(catalog_name)
  select top(400000) newid()
   from sys.trace_event_bindings as b1
       ,sys.trace_event_bindings as b2
go

Now finally, we do some re-indexing just to make sure nothing like fragmented indexes or stale statistics will hamper our experiment’s results:

-- Rebuild/create index will also generate
-- brand new statistics with full scan
alter index all on article rebuild
go
create
nonclustered index ix_covering
 on article(catalog_name) include (filler)
go

I’ve created a covering index, including all of the table’s columns. Usually not a good idea but we can afford this, since the the table’s data never get modified by OLTP operations. Articles are rolled out from a master data source every two weeks, so we can have plenty of indexes simply for speeding up article searches. (We do bulk loads and index rebuilds.) Of course, you may object that I could have created the clustered index on the catalog_name column. Yes, may be a good idea. I come back to this later.

Let’s switch on statistics io and statistics time counters to get an idea of how many reads and how much time every of the following queries need:

set statistics io on
set
statistics time on

Ok, everything is set now. Since we’ve just performed an index rebuild, we should also be sure that all of our data- and index pages reside in the buffer pool. We’re ready to perform some searches and look at execution plans and the required number of reads. Here’s the first query:

select catalog_name, filler
  from article
 where catalog_name like 'abac%'

Nothing special here. The number of reads is 6 and as expected, the covering index is utilized. This is also, what the execution plan reveals:

image

Very good! That’s the best plan we can get for the invoked query. The execution time is hardly measurable. It’s about 1ms on my machine, where CPU utilization does not even appear. (It says 0ms, so I suppose it’s simply somewhat lost in the general noise floor.)

The operator information for the Index Seek exposes a slightly little detail:

image

The seek actually is a range scan, so the Index Seek scans a range of rows from the index. That is, the seek starts at a particular minimum value and then subsequently scans all leaf level rows in an ordered sequence until the predicate’s condition isn’t complied any more. To determine which rows will fulfill the predicate, a maximum value for the range scan is determined. The actual minimum and maximum values are calculated by examining the provided literal for the LIKE operator which is “abac%” in our experiment. That’s an important trifle which we’ll get back to a little later on.

Unfortunately we discovered users would execute somewhat different requests. There’s a clear pattern we saw, and sadly most of the provided search strings started (and also ended) with wildcards (“%”). So, instead of looking for articles, starting with “abac”, more often than not users are looking for articles, containing “abac”. That’d be a query like this one:

select catalog_name, filler
  from article
 where catalog_name like '%abac%'

Of course that’s a complete different query! We see a lot more reads (35313) and the execution plan reveals a parallel index scan this time:

image

The execution time is 1122 ms and the CPU utilization (with two cores) is 1997 ms, a big raise in both!

Apparently an index seek can’t be performed here, because we’re using a wildcard at the first position of the filter condition, or can it?

Let’s create a stored procedure for the search:

if (object_id('search_articles', 'P') is not null)
  drop procedure search_articles
go 
create procedure search_articles(@p1 nvarchar(80)) with recompile
  as
  select catalog_name, filler
    from article
  
where catalog_name like @p1
go

Because we expect to see varying search arguments, leading to very different row count estimations and execution plans, the procedure is created by specifying the WITH RECOMPILE hint, so an execution plan is never cached, but will always be generated when the procedure gets called.

Ok, let’s try the procedure. First we invoke a simple search, where we expect an index seek as the optimal operator:

exec search_articles 'abac%'

The execution plan looks pretty much the same as above, where the index seek on the existing covering index has been utilized:

image

We see 6 reads and again, the execution time is only about 1 ms (and another 1-2 ms for stored procedure compilation). The execution plan looks pretty much the same as before, with the exception of the now existent Constant Scan and the Compute Scalar operators. These two operators calculate minimal and maximal values for the Index Seek (or the Index Range Scan, if you like) from the provided parameter.

Let’s invoke the procedure with a wildcard as the first character for the search argument:

exec search_articles '%abac%'

Surprisingly, the execution plan doesn’t reveal an Index Scan this time. Although the query is the same as in our second example, where an Ad Hoc query with an identical search argument was executed, the procedure’s plan sticks to an Index Seek:

image

Additionally, we don’t see any parallel execution. That’s probably since an Index Seek (or Index Range Scan) has to be done sequentially, so parallel execution is simply not possible. The query took 31120 reads, 1810 ms CPU time, and 1841 ms in total. That’s a lot of reads for only 128 rows in the result set. So, is there anything we can do to lower the number of reads resp. the execution time?

May be, we can! Think about the reason for the huge number of reads. It’s because all leaf level rows of the index have to be investigated to determine, whether a particular row matches the search condition or not. Therefore the navigation through our covering index looks like this:

image

That’s a scan of the whole covering index, although the plan exposes an Index Seek here! See where this leads to? We can probably lower the number number of necessary reads by creating a non covering index with catalog_name as the only column in it. In that case the Index Range Scan had to read fewer pages to find our 128 keys for an additional clustered index Key Lookup.

Let’s try this out. First, we create the non covering index:

create index ix_noncovering on article(catalog_name)

And now, we invoke the Ad Hoc query again:

select catalog_name, filler
  from article
 where catalog_name like '%abac%'

As expected, the plan exposes an Index Seek on the non covering index to find all rows matching the predicate, and additional Key Lookups for retrieval of the missing column data:

image

This query took 4559 reads, about 7 times fewer than before! Unfortunately, on my machine, execution times haven’s changed much. The query still took 1830 ms with a CPU utilization of 1763 ms. But fewer reads means fewer resource usage, and also fewer blockings (latches in particular), so the result isn’t too bad at all. And this is the moment, where I’d like to mention the clustered index again. Some people would advice to create the clustered index on the catalog_name column. This is generally a good idea, when the clustered index can be utilized for seeks. When Clustered Index Scans (or Range Scans) come into play, the clustered index (since it is covering) may not be an optimal choice.

Let’s now see, how our stored procedure behaves with the new non covering index in place:

This is the call:

exec search_articles '%abac%'

And here’s the execution plan:

image

Surprisingly the plan has not changed! The non covering index isn’t utilized, although the number of reads would have been 7 times lower! So, the query still needs 31120 reads, 1905 ms total execution time, and 1872 ms of CPU utilization. When invoking the stored procedure, the covering index is always used!

Is the optimizer wrong? I don’t think so, because the optimizer uses a strategy that doesn’t care much about resource utilization – with three exceptions: time, time, and time. Minimizing the execution time is the only thing that counts for the optimizer. The optimizer is very aggressive when trying to reach the goal of minimizing execution time and thus, a query may eat up all CPU cycles, do a vastly number of reads, or burden the environment with even more greenhouse gas emission, and further increase global warming. All this topics will not be taken into account by the optimizer.

The Ad Hoc query is, by the way, more “sensitive” here. The following query:

select catalog_name, filler
  from article
 where catalog_name like 'abac%'

will perform an Index Seek on the covering index, since this takes only 6 reads and is very fast.

So finally, let’s remove the covering index to see, how this affects or queries:

drop index ix_covering on article

Now the stored procedure can’t perform an Index Seek on the covering index any more. It has to use the non covering index instead. We’ll see some additional Key Lookups as a consequence. As long as a query returns only a handful of rows, this will need fewer reads, however. With a larger number of rows in the result set, dropping the covering index will affect the query performance negatively. So, deciding which index(es) to create is not an easy task!

The following tables summarizes all measured results:

Search argument: "%abac%"

Query

Existing Indexes

Ad Hoc

Stored Procedure

Covering Index

Non Covering Index

Reads

Execution Time

CPU Time

Reads

Execution Time

CPU Time

Yes

No

35,313

1,122

1,997

31,120

1,841

1,810

Yes

Yes

4,559

1,830

1,763

31,120

1,905

1,872

No

Yes

4,576

1,806

1,747

4,576

2,092

2,074

 

Search argument: "abac%"

Query

Existing Indexes

Ad Hoc

Stored Procedure

Covering Index

Non Covering Index

Reads

Execution Time

CPU Time

Reads

Execution Time

CPU Time

Yes

No

5

1

0

5

2

0

Yes

Yes

5

1

0

5

3

0

No

Yes

18

1

0

18

2

0

I’ve executed every experiment a few times and averaged obtained results, to ensure side effects are minimized.

Note: If you start your own experiments by using the scripts provided here, your results may vary, since the scripts operate with random values.

Followers