Thursday, November 4, 2010

Multiple statistics sharing the same leading column

SQL Server will not prevent you from creating identical indexes or statistics. You may think that everything is under your control, but have you ever added a missing index that has been reported by the query optimizer, or the DTA? If so, you will certainly have created duplicate statistics. Have a look at the following example:

-- create test table with 3 columns
-- filler is only nedded to widen the row

table T1

int not null identity(1,1) primary
 ,c2 int not
 ,filler nchar(200) not null default

-- insert 500000 rows with random
-- values between 0 and 49999 for c2

insert T1(c2
  select top(500000) abs(checksum(newid())) %
from sys.fulltext_system_stopwords as
,sys.fulltext_system_stopwords as b

Before invoking the following SELECT command, allow the actual execution plan being displayed (Strg-M or selecting Query/Include Actual Execution Plan from the menu).

-- get some data
select * from T1 where c2 = 100

If you look at the execution plan, you see that the optimizer complains about a missing index on column c2. The prognosticated improvement is about 99%, so adding the index it’s certainly a good idea. Let’s do it:

-- add missing index
create nonclustered index ix_1 on t1(c2)

Perfect! The query runs much faster now and needs a lot fewer resources. But have a look at the table’s statistics:


You see three statistics, one for the primary key, a second one for our created index ix_1, and a third one that was automatically created during execution plan generation for the first SELECT statement. This is the statistics, named _WA_Sys…. If the AUTO CREATE STATISTICS option is set to ON, the optimizer will add missing statistics automatically. In our little experiment, the optimizer had to generate this column statistics on column c2 in order to make some assumptions about the number of rows that had to be processed.

And here’s the problem: When creating the index on column c2, a statistics on this column is also created, since every index has a corresponding linked statistics. That’s just the way it works. At the time the index was added, the column statistics on c2 (that _WA_Sys… statistics) already existed. If you don’t remove it manually, this statistics will remain there forever, although it is needless now. All it’s good for is to increase maintenance efforts during statistics updates. You can safely remove this statistics by executing:

drop statistics t1._WA_Sys_...

If you didn’t think about this before, there’s a chance that you’ll find some of those superfluous statistics duplicates inside your database(s). Here’s a query that finds index-related and column-statistics that match on the first column. Looking for matches on the first column is sufficient here, since the optimizer only automatically adds missing single-column statistics.

with all_stats as
  select s.object_id, as stats_name, s.stats_id, s.
        ,s.auto_created, sc.
        ,sc.column_id, as
    from sys.stats as
         inner join sys.stats_columns as
                 on s.stats_id = sc.
                and s.object_id = sc.
                and sc.stats_column_id =
inner join sys.columns as
                 on c.object_id = s.
                and c.object_id = sc.
                and c.column_id = sc.
   where objectproperty(s.object_id, 'IsUserTable') = 1
select row_number() over(partition by s1.object_id, s1.column_name order by s1.column_name) as
      ,object_name(s1.object_id) as
      ,s1.stats_name as
      ,s2.stats_name as
  from all_stats as
       inner join all_stats as
               on s1.object_id = s2.
              and s1.stats_id != s2.
              and s1.stats_column_id = s2.
              and s1.column_id = s2.
              and s1.has_filter = s2.
              and s1.auto_created != s2.auto_created

With that query at hand, you may easily find redundant statistics. Here’s a sample output:


If you’d like to find out more about SQL Server statistics, you may want to check out my series of two articles, published recently on the Simple-Talk platform.

Part 1: Queries, Damned Queries and Statistics
Part 2: SQL Server Statistics: Problems and Solutions

It’s free, so you might want to give it a try. Feel free to vote, if you like it!

Monday, November 1, 2010

No automatically created or updated statistics for read-only databases

Ok, that’s not really the latest news and also well documented. But keep in mind that database snapshots also fall into the category of read-only databases, so you won’t see automatically maintained statistics for these as well.

I’ve seen the advice of taking a snapshot from a mirrored database for reporting purposes many times. Even books online contains a chapter Database Mirroring and Database Snapshots, where the scenario is explained.

The intention is to decouple your resource intensive reporting queries from the OLTP system which is generally a good idea. But keep in mind that:

  1. Reporting queries are highly unpredictable, and
  2. Reporting queries differ from OLTP queries.

So there’s more than only a slightly a chance that query performance will suffer from missing statistics in your reporting snapshot, since the underlying OLTP database simply does not contain all statistics (or indexes) that your reporting applications could take advantage of. These missing statistics can’t be added in the snapshot, because it’s a read-only database. Additionally, you won’t experience any automatic updates of stale statistics in the snapshot. And moreover, any added or updated statistics in the source database are not transferred into the snapshot, of course. This can affect your query performance significantly!

Have a look at the following example:

use master

-- Create test database
create database ssBase
-- create test table and insert 10000 rows
create table t1
  c1 int not null primary key
 ,c2 int not null
 select top 10000
        row_number() over(order by current_timestamp)
   from sys.fulltext_system_stopwords

-- create snapshot of test database
-- NOTE: the filename has to be adjusted
-- for your own experimenting
create database ssBase_SS on
   name = ssBase
  ,filename = 'e:\SqlTestData\'
as snapshot of ssBase;

The script above creates a test database with one table and adds some rows to the table before creating a snapshot of this database. As we have never used column c2 in any query (besides the INSERT), there won’t be any statistics for column t1.c2.

Now let’s query the snapshot and have a look at the actual execution plan. Here’s the query:

-- Be sure to show the actual execution plan
-- for the following query
use ssBase_SS

* from t1
 where c2 = 10

Here’s the actual execution plan:


Clearly evident that the optimizer detects a missing statistics, although the options AUTO CREATE STATISTICS and AUTO UPDATE STATISTICS have both been set to ON. The plan reveals a noticeable difference between the actual and estimated number of rows and also a warning regarding the missing statistics.

So, keep that in mind when using snapshots for reporting applications.

If you’d like to find out more about SQL Server statistics, you may want to check out my series of two articles, published recently on the Simple-Talk platform.

Part 1: Queries, Damned Queries and Statistics
Part 2: SQL Server Statistics: Problems and Solutions

It’s free, so you might want to give it a try. Feel free to vote, if you like it!