Saturday, March 27, 2010

Bad index usage within stored procedure

I came across this problem last week. Here’s a table containing contact data:

create table contact
  last_name nvarchar(200)
 ,city nvarchar(200)
 ,filler nchar(200)
contact (last_name,City)
  select LastName, City
    from AdventureWorks2008R2.Sales.vIndividualCustomer

Yes, I’m already using SQL Server 2008 R2 but I was able to reproduce the behavior described below also with SQL Server 2008. The script above will insert 18,508 rows into our table.

Let’s create a clustered index on the last_name column and a nonclustered index on the city column:

create clustered index ix_t1_last_name on contact(last_name)
nonclustered index ix_t1_city on contact(city)

Given this table, what do you think about index utilization of the following ad hoc statement:

select *
    from contact
   where last_name like '%nobody%'
     and city like 'nowhere%'

The SELECT statement returns 0 rows and the optimizer is able to find out this by using the nonclustered index on the city column. The execution plan proves it:


As the index contains two levels, two logical reads are necessary to obtain the empty result set.

Ok, now let’s create the following stored procedure:

create procedure getNames(@name nvarchar(50), @city nvarchar(50)) with recompile as
  select *
    from contact
   where city like @city
     and last_name like @name go

Inside the SP the same statement as before is executed. Also, the procedure is created with the RECOMPILE option, to ensure a fresh execution plan is generated for every execution of the SP. Let’s say, we like it to be this way, since we expect very different parameter values will be provided which in turn will create result sets of very different cardinality.

If we call the procedure like this:

exec getNames @name = '%nobody%', @city = 'nowhere%'

What will the execution plan look like? Inside the procedure the same statement is executed as the ad hoc statement above, so I’d expected to see the same execution plan as before.

But surprisingly, the execution plan reveals a clustered index seek this time which is far from optimal! Look at this:


Although the plan shows a Clustered Index Seek, this seek actually is a scan, because of the provided search argument “%nobody%”. (See here for more information.) Therefore, all leaf level pages of the clustered index have to be scanned, and this takes 1,065 reads. That’s about than 300 times more then it was with the ad hoc query!

I did some more experiments, where I left out the clustered index, e.g. and created both indexes nonclustered. Same result. If executing the procedure, always the suboptimal index is chosen, with one exception: If I create the clustered index on the city column, both queries will take advantage of the clustered index.

Strange. I wasn’t expecting this. Due to parameter sniffing, the procedure’s plan should be adjusted to the parameter values provided. This statement should be even more valid, since the procedure was created by specifying the RECOMPILE option. But parameter sniffing seems not to work in this case. As I don’t know why, and I also consider this behavior totally unexpected and kind of wrong, I’ve created a bug for this on MSFT’s connect platform. You can vote for it here.

No comments:

Post a Comment