Aristocratic Elfin Share

Friday, June 7, 2013

Consequence of Redundant Index on your table.

If have multiple queries where your searching criteria varies from col1 in first query, col1 and col2 in second query, col1 and col3 in third query, how you will be proceeding with index creation.

Let’s stimulate the whole situation by taking an example; we are creating a table employee with some columns.

set nocount on

create table dbo.Employee

(emp_id int identity primary key,

first_name varchar(10),

last_name varchar(10),

address varchar(20),

city varchar(10),

state varchar(10),

active bit)

Command(s) completed successfully.

Our table is ready, now lets insert some data into it.

declare @cnt int, @id varchar(4)

select @cnt = 1

while (@cnt <= 10000)

begin

select @id = cast(@cnt as varchar)

insert into Employee(first_name, last_name, address, city, active)

select 'f'+@id, 'l'+@id, 'a'+@id, 'c'+@id, 1

select @cnt = @cnt + 1

end

Command(s) completed successfully.

Now since we have table Employee in position lets create indexes on the basis of predicate used in the select queries.

create index ix_emp_lastname on dbo.Employee(last_name)

create index ix_emp_lastname_firstname on dbo.Employee(last_name, first_name)

create index ix_emp_lastname_active on dbo.Employee(last_name, active)

Now here we have created three indexes on Employee, first on last_name column, second on last_name and first_name, third on last_name and active column.

Lets check out the space used by the index by executing sp_spaceused

exec sp_spaceused Employee

name rows reserved data index_size unused

--------------- --------- ------- ---------- -------

Employee 10000 1376 KB 432 KB 712 KB 232 KB

So the index is using 712Kb of space. Now lets fire each query and check out the execution plan to see how and which index it is using,

select emp_id from dbo.Employee

where last_name = 'l34'

The first query used index ix_emp_lastname_active, now lets run the second query and see the execution plan

select emp_id from dbo.Employee

where last_name = 'l34' and active = 1

The second query used index ix_emp_lastname_active, now lets run the third query and see the execution plan

select emp_id from dbo.Employee

where first_name = 'f34' and last_name = 'l34'

The third query used index ix_emp_lastname_firstname, now lets run the fourth query and see the execution plan

select emp_id from dbo.Employee

where first_name = 'f34' and last_name = 'l34' and active = 1

Now these are interesting. The first and second queries used index ix_emp_lastname_active. The third query used index ix_emp_lastname_firstname. For the last query, the optimizer decided to use ix_emp_lastname_firstname as well and decided to do a lookup to get the actives value.

NOTE (Intresting Look) : If you haven't noticed yet, index ix_emp_lastname was not used to satisfy any of these queries. It's just sitting around, literally taking up space.

Now let's drop these indexes and add a single index to satisfy the same four queries

drop index dbo.Employee.ix_emp_lastname

drop index dbo.Employee.ix_emp_lastname_firstname

drop index dbo.Employee.ix_emp_lastname_activesw

Lets create a single index which will satisfy the all the query

create index ix_emp_lastname on dbo.Employee(lastname, firstname, active)

If we now re-run sp_spaceused, we'll see that the index space used by the table has been reduced by over 100%

exec sp_spaceused Employee

name rows reserved data index_size unused

-------------------- ------------------ ------ ---------- ---------

Employee 10000 846 KB 432 KB 304 KB 112 KB

Lets again execute those queries and see the new execution plan.

select emp_id from dbo.Employee where last_name = 'l34'

Now the first query used index ix_emp_lastname, now lets run the second query and see the execution plan

select emp_id from dbo.Employee where last_name = 'l34' and active = 1

The second query is also using the same index ix_emp_lastname, now lets run the third query and see the execution plan

select emp_id from dbo.Employee where first_name = 'f34' and last_name = 'l34'

Again we can see the third query is also using the same index ix_emp_lastname, now lets run the fourth query and see the execution plan

select emp_id from dbo.Employee where first_name = 'f34' and last_name = 'l34' and active = 1

The fourth query is also using the same index.

Conclusion: the single index satisfies all four queries. In addition, the last query that performed a lookup for the actives now reads it directly from the index and does not resort to searching the clustered table for the value.

If you desire to kill, kill me through code;honor me HEAVEN.

Post Reference: Vikram Aristocratic Elfin Share

Hierarchical Data Storing in Database table

It was a very good morning, my friend Snehal came up with a good discussion on storing the folder structure in database, we had a good discussion and end up with a good solution.

Problem Statement:

Suppose folder structure is der like folder n subfolder so suppose “Main” folder has sub folder “A” and “B” and “A” has “a1” and B has “b1” so how will you maintain dis data in single table and how you will get all sub folders(“A”,”B”,”a1”,”b1”) of “Main” folder .

root1 root2 root3

---------- ---------- ----------

mail A a1

mail B b1

Solution :

create table SnehalKaQuestion

(FolderId int,

FolderName varchar(10),

ParentFolder int null)

insert into SnehalKaQuestion values(1,'mail',null),(2,'A',1),(3,'a1',2),(4,'B',1),(5,'b1',4)

delete SnehalKaQuestion

select FolderName from SnehalKaQuestion

select s1.foldername as root1, s2.foldername as root2, s3.FolderName as root3

from SnehalKaQuestion s1

left join SnehalKaQuestion s2 on s2.ParentFolder =s1.FolderId

left join SnehalKaQuestion s3 on s3.ParentFolder=s2.FolderId

where s1.FolderName = 'mail'

EXPLAINATION

S1 Table

FolderId FolderName ParentFolder

----------- ---------- ------------

1 mail NULL

2 A 1

3 a1 2

4 B 1

5 b1 4

S2 Table

FolderId FolderName ParentFolder

----------- ---------- ------------

1 mail NULL

2 A 1

3 a1 2

4 B 1

5 b1 4

(5 row(s) affected)

select s2.FolderId as s2FolderID,s1.foldername as root1, s2.foldername as root2

from SnehalKaQuestion s1

left join SnehalKaQuestion s2 on s2.ParentFolder =s1.FolderId

where s1.FolderName = 'mail'

S1 left Join S2

s2FolderID root1 root2

----------- ---------- ----------

2 mail A

4 mail B

S3 Table

FolderId FolderName ParentFolder

----------- ---------- ------------

1 mail NULL

2 A 1

3 a1 2

4 B 1

5 b1 4

select s1.foldername as root1, s2.foldername as root2, s3.foldername as root3

from SnehalKaQuestion s1

left join SnehalKaQuestion s2 on s2.ParentFolder =s1.FolderId

left join SnehalKaQuestion s3 on s3.ParentFolder=s2.FolderId

where s1.FolderName = 'mail'

(S1 left Join S2) left join s3

root1 root2 root3

---------- ---------- ----------

mail A a1

mail B b1

Burn yourself through code, incredible will come up, we will label it new feature

Post Reference: Vikram Aristocratic Elfin Share

Friday, May 3, 2013

Know potential reduction in the size of the table before implementing VARDECIMAL

Before we actually go for implementing Vardecimal, we should know whether the table required Vardecimal. It is clear if a table has no decimal or numeric field, there will be no saving, but even if you have table with decimal and numeric field whether implementing Vardecimal make any sense on storage? There is no guarantee that you will be able to reduce the size of the table by enabling Vardecimal storage format.

SQL Server provides a stored procedure, to estimate the ‘reduction in row size’ with Vardecimal storage format. We will use this to know whether the table requires Vardecimal implementation.

Let stimulate a situation to know whether to implement Vardecimal by taking two table testWhetherNeedVarDecimal and testWhetherNeedVarDecimal2 and check which among these two require Vardecimal.

Let’s create first table testWhetherNeedVarDecimal with two field of numeric and varchar type.

create table testWhetherNeedVarDecimal

(col1 numeric(18,2),

col2 varchar(10) default 10)

Now we have table in place let’s populate some data in it.

declare @cnt int

set @cnt=1

while (@cnt < 3000)

begin

if (@cnt < 2000)

insert into testWhetherNeedVarDecimal(col1) values(0)

else

insert into testWhetherNeedVarDecimal(col2) values((@cnt + 10))

set @cnt +=1

end

We are ready with table1, now lets create another table testWhetherNeedVarDecimal2.

create table testWhetherNeedVarDecimal2

(col1 numeric(10,2),

col2 varchar(10) default 'asd')

Let’s populate testWhetherNeedVarDecimal2 with some data

declare @cnt int

set @cnt=1

while (@cnt < 3000)

begin

if (@cnt < 2000)

insert into testWhetherNeedVarDecimal2(col1) values(12345678.12)

set @cnt +=1

end

Now since both the table are ready, let’s try analyze which table requires Vardecimal by firing system stored procedure sp_estimated_rowsize_reduction_for_vardecimal

exec sp_estimated_rowsize_reduction_for_vardecimal 'testWhetherNeedVarDecimal'

avg_rowlen_fixed_format avg_rowlen_vardecimal_format row_count

----------------------- ---------------------------- ---------

22.67 15.67 2999

exec sp_estimated_rowsize_reduction_for_vardecimal 'testWhetherNeedVarDecimal2'

avg_rowlen_fixed_format avg_rowlen_vardecimal_format row_count

----------------------- ---------------------------- ---------

22.00 21.00 1999

So it is clear from the output that table testWhetherNeedVarDecimal requires Vardecimal, and by doing so user will be able to save (22.67-15.67) 7 byte per record which is significant enough to save.

Focus on code what really matters & letting go of the rest.

Post Reference: Vikram Aristocratic Elfin Share

Just like Varchar for Char, why don’t we have Varnumeric for Numeric…

When tables grow to millions of rows reducing the amount of storage a few bytes per row can have significant effect. The decimal and numeric data types are typically stored on disk as fixed-length data. Mostly these values are defined with high precision, but precision is not required in most of these cases. In such cases the storage space consumed is unnecessary. The situation becomes even worse when NULL or zero values are also present in numeric or decimal type column. Such issues are required to be addressed for optimization of storage space, but what options are available?

SQL Server 2005 has introduced the VARDECIMAL storage format. This new storage format, stores decimal and numeric data as variable length columns and can be enabled or disabled at the table level.

When a table is not using the VARDECIMAL format every numeric or decimal value uses a fixed size (between 5 and 17 bytes) even when the data is 0. With VARDECIMAL, storage becomes dynamic and optimized for each value.

The VARDECIMAL is implemented at table level, so we can have table first with all column defined and inserted data, there after we can find how much space it takes without actually going for VARDECIMAL and how much we can save if we go for VARDECIMAL.

Let create the scenario to test the VARDECIMAL, we are creating table with decimal and numeric datatype fields

create table testVarDecimal

(col1 decimal(18,0),

col2 numeric(18,0))

Command(s) completed successfully.

Now lets try to insert around 4000 record in by the below script.

declare @cnt int

set @cnt=1

while (@cnt < 4000)

begin

if (@cnt < 2000)

insert into testVarDecimal values(@cnt,0)

else

insert into testVarDecimal values(@cnt,(@cnt + 100))

set @cnt +=1

end

Now since we have records in place, lets analyze how much space we can reduce if we implement VARDECIMAL to this table.

exec sp_estimated_rowsize_reduction_for_vardecimal 'testVarDecimal'

avg_rowlen_fixed_format avg_rowlen_vardecimal_format row_count

----------------------- ---------------------------- ----------

25.00 16.27 3999

We can see that there may be difference of about (25.00 – 16.27) i.e.9 bytes in each row. So it seems suitable to apply vardecimal storage for this table.

To implement VARDECIMAL, we first need to enable it for the database, which can done like the way below

execute sp_db_vardecimal_storage_format 'testDemo', 'on'

Command(s) completed successfully.

Here my database is testDemo, you can put your own database name, where your table object reside.

Now let’s implement it on table by following script

exec sys.sp_tableoption 'testVarDecimal', 'VarDecimal storage format',1

So it is all over. If you want to check which all database has VarDecimal Enabled just fire the below command

exec sp_db_vardecimal_storage_format

To check which table has VARDECIMAL enabled execute the following

select name, OBJECT_ID, type_desc from sys.objects

WHERE OBJECTPROPERTY(OBJECT_ID, N'TableHasVarDecimalStorageFormat') = 1

Code not always straightforward, but rewards are worth it

Aristocratic Elfin Share

Pages

About Me

Search This Blog

Friday, June 7, 2013

Consequence of Redundant Index on your table.

Hierarchical Data Storing in Database table

Friday, May 3, 2013

Know potential reduction in the size of the table before implementing VARDECIMAL

Just like Varchar for Char, why don’t we have Varnumeric for Numeric…