Sunday, February 27, 2011

Calculating SQL Server Data Compression Savings

SQL Server 2008 Enterprise edition comes with an opportunity for storing table or index data in a compressed format which may save huge amount of storage space and – much more important – IO requests and buffer pool utilization. There’s two different options for data compression, namely Row and Page level compression. This blog post is not concerned with how these two work internally and will also not explain the differences between the two. If you like to know more about this, you find much of useful information on the internet - including links to further articles (e.g. here, here, and here).

Whether compression is worth or not isn’t an easy question to answer. One aspect that has to be taken into account is certainly the amount of storage that may be saved by storing a distinct table or index in any of the two compressed formats. SSMS offers a Data Compression Wizard than can provide storage-saving estimates for row or page level compression. From the context menu for a table or index just open Storage/Manage Compression… In the Combo box at the top select the compression type (Row or Page) and press the Calculate Button at the bottom. Here’s a sample of a calculated saving for an index:


Unfortunately, SSMS does not offer an option for calculating estimated savings for more than one table or index at once. If you, let’s say, would like to know the estimated storage savings of page level compression for you largest 10 tables, there’s no GUI support in SSMS that will assist you in finding an answer. This is, where the stored procedure sp_estimate_data_compression_savings comes in handy. This procedure – as you may have guessed from its name – provides estimated savings for row or page level compression for any table or index. You have to provide the table or index as a parameter to the procedure. In other words: The procedure will only calculate the estimations for one table or index at a time. If you want to retrieve the calculations of more than one table or index as a result set, there’s some more work to do, since the procedure has to be invoked multiple times. Here’s a script that calculates the estimated savings of page level compression for the database in context.

-- Determine the estimated impact of compression
-- NOTE: This script is only for SQL Server Enterprise and Developer edition.

nocount on

-- We create a temp table for the result
if (object_id('tempdb..#comp', 'U') is not
  drop table #comp
create table #comp
 ,[size_with_current_compression_setting (KB)]
 ,[size_with_requested_compression_setting (KB)] bigint

,[sample_size_with_current_compression_setting (KB)] bigint

,[sample_size_with_requested_compression_setting (KB)]

-- Calculate estimated impact of page level compression for all
-- user-tables and indexes in all schemas.
-- NOTE:
--  1) To get the estimated impact of row level compression change the last parameter
--     of sp_estimate_data_compression_savings to 'row' instead.
--  2) We don't care about partitioning here. If this is important for you,
--     you have to modify forth parameter of sp_estimate_data_compression_savings.
--     Please refer to BOL.

declare @cmd nvarchar(max
set @cmd =
select @cmd =
';insert #comp exec sp_estimate_data_compression_savings '''
+ schema_name(schema_id)+''','''
+ name + ''',null, null, ''page'''
  from sys.
 where objectproperty(object_id, 'IsUserTable') = 1
exec (@cmd)

-- Do some further calculations for a more meaningful result
compressionSavings as
  select quotename(schema_name) + '.' + quotename(object_name) as
[size_with_current_compression_setting (KB)]
[size_with_requested_compression_setting (KB)]
                when [size_with_current_compression_setting (KB)] = 0 then
else 100.0*(1.0-
[size_with_requested_compression_setting (KB)]
/[size_with_current_compression_setting (KB)]
              end as decimal(6,2)) as
[Estimated Savings (%)]
from #comp
select cs.
,isnull(, i.type_desc) as
[size_with_current_compression_setting (KB)]
[size_with_requested_compression_setting (KB)]
,cs.[Estimated Savings (%)]

   from compressionSavings as
left outer join sys.indexes as
on i.index_id = cs.
and i.object_id = object_id(cs.table_name, 'U'
  order by cs.[Estimated Savings (%)]

-- Get rid of the temp table
table #comp

The script calculates the impact of Page level compression but may easily be adapted to consider Row level compression instead. Please read the comments inside the script. Also, please notice that the script will only run on SQL Server Enterprise and Developer edition. All other editions don’t provide the opportunity for data compression.

Here’s a partial result retrieved from running the script against the AdventureWorksDW2008R2 database.


If you execute the script, please be aware that it may produce some extensive I/O. Running the script against your production database at business hours wouldn’t be a very good idea therefore.

Tuesday, February 22, 2011

SQL Server Start Time

Have you tried finding out the time, your SQL Server instance has been started? There are some sophisticated solutions, like the one from Tracy Hamlin (twitter), which takes advantage of the fact that tempdb is re-created every time, SQL Server starts. Her solution goes like this:

select create_date
from sys.databases where database_id=2

Another answer to the question, I’ve seen sometimes on the internet queries the login time for any of the system processes:

select login_time
from sys.dm_exec_sessions where session_id=1

This was my preferred way – until yesterday, when I discovered the following simple method:

select sqlserver_start_time
from sys.

Easy, isn’t it? Interestingly though, every of the above three queries yields a different result. Here’s a query with a sample output:

select (select sqlserver_start_time
from sys.dm_os_sys_info) as
from sys.databases where database_id=2) as
from sys.dm_exec_sessions where session_id=1) as sysprocess_login_time



It seems the SQL Server service must be started first. Only after the service is running, tempdb is created followed by a subsequent start of all sysprocesses. I can’t imagine that the diverse three times make any difference in practice, e.g. if you try finding out for how many hours your SQL Server instance is running. But out there may be existing applications that have to be aware of the difference.