About Me

My photo
Mumbai, Maharastra, India
He has more than 7.6 years of experience in the software development. He has spent most of the times in web/desktop application development. He has sound knowledge in various database concepts. You can reach him at viki.keshari@gmail.com https://www.linkedin.com/in/vikrammahapatra/ https://twitter.com/VikramMahapatra http://www.facebook.com/viki.keshari

Search This Blog

Sunday, July 7, 2013

Advantage of having a non clustered index same as a clustered index on a table


I found this as a question in one of the forum and thought of coming up with an article with the same. His question was like this

Problem Statement: Is there any advantage of having a non clustered index same as a clustered index on a table?.
Here is a little background to his question question: In our current data warehouse framework (sql server version - 2005), In all of the tables we have one unique non clustered index same as a clustered index. So I was wondering if there is any advantage of doing that.
Solution: The only advantage which I could figure out is that the entries on leaf pages of nonclustered index are not as wide. They only contain index columns while the clustered index' leaf pages are the actual rows of data. Therefore, if you need something like select count (column_name) or count (*) from your_table then scanning the nonclustered index will involve considerably smaller number of data pages.
To prove our statement lets create a scenario, here we are creating table DupIndexDemo.

create table DupIndexDemo 
(id int identity(1,1), 
name varchar(200), 
ordered_qty int, 
date_getdate datetime default getdate()  )

Command(s) completed successfully.
  
Let’s now create index on ID, one is cluster and another we are creating noncluster index on the same column id.

create clustered index idx_nc_id on DupIndexDemo(id) 
create nonclustered index idx_c_id on DupIndexDemo(ordered_qty)  

Command(s) completed successfully.

Now let’s insert few records in DupIndexDemo.

declare @i int =0 
while @i < 10000 
begin    
      insert into DupIndexDemo(name,ordered_qty)    
      values  ( 'DemoNames for ' + cast(@i as varchar(10)),rand(1) * 100 )    
      set @i = @i + 1 
end

Let’s fire count (*) query to check, which index it is picking, later we will justify why it picked non cluster index.

select COUNT(*) from DupIndexDemo



















Here from the explain plan we can see it picked NonClustered Index idx_nc_id and by extending the nonCluster operator we can see the IO cost involved is 0.0157176 and it cost 82% of the total cost to find the count of records.

Lets now lets drop NonCluster Index and fire the count (*) query.

drop index idx_nc_id on DupIndexDemo
Command(s) completed successfully

Now lets run the count(*) query and check the cost measure.

select COUNT(*) from DupIndexDemo

















Here we can see cluster Index is taking 91% of the total cost involved in the query and the IO cost involved is 0.0475694 which is around four times compare to NonCluster Index.

Thus here we can say NonCluster index on the same column is giving performance optimized. But still I will not prefer to have cluster and NonCluster on the same column since it leads to duplicate Indexes which is considered as bad practice.
   
Whereas if you remove both the indexes then it will do table scan.

My girl friend always labels me Idiot; if programmers are idiots then Program Managers must be Idiot boxes ;)  


Saturday, July 6, 2013

Select count(*) Performance analysis with NonCluster Index


There is a common perception that count(*) is not a optimized way of getting the number of records in a table. In this article we will be examining the same and finally we will conclude with our finding.

For analysis lets take the sales.Store table of AdventureWork database, lets see first how many indexes are there in place for Sales.Store table.

select OBJECT_NAME(object_id),name,index_id,type_desc from sys.indexes where object_id= OBJECT_ID('Sales.Store')

      name                      index_id    type_desc
------------------------------------------- -------------
Store PK_Store_BusinessEntityID 1           CLUSTERED
Store AK_Store_rowguid          2           NONCLUSTERED
Store IX_Store_SalesPersonID    3           NONCLUSTERED
Store PXML_Store_Demographics   256000      XML

(4 row(s) affected)

 Here we can see there are total of four index available on Sales.Store table.

Let’s now fire DBCC IND command for each index to find the number of page these indexes are span.

--Index id 1. PK_Store_BusinessEntityID
DBCC IND ('Adventureworks2008', '[Sales].[Store]', 1);
  Page Count : 103
 
--Index Id 2. AK_Store_rowguid
DBCC IND ('Adventureworks2008', '[Sales].[Store]', 2);
  Page Count : 4 
 
--Index Id 3. IX_Store_SalesPersonID
DBCC IND ('Adventureworks2008', '[Sales].[Store]', 3);
  Page Count : 4 
 
--Index Id 256000. PXML_Store_Demographics
DBCC IND ('Adventureworks2008', '[Sales].[Store]', 256000);
  Page Count : 66 

Since IX_Store_SalesPersonID has lowest pagecount, smallest index among all, if it gets picked up for finding total row count, it would be optimized one.

Let see which index count(*) will pick for getting the total number of rows in Sales.Store table.

select COUNT(*) from Sales.Store








So here we saw for getting count(*) optimizer picked IX_Store_SalesPersonID index.

Conclusion: Thus, when using COUNT(*) verify whether correct indexes are being used and do not believe that COUNT(*) will use Clustered index.   

If you punish a programmer with code he will reach to dreamland, dare to punish


Post Reference: Vikram Aristocratic Elfin Share

Forwarding Records, Forwarding Pointers Masked Performance Killer for NonCluster Index


Tool Used : sys.dm_db_index_physical_stats DMV
Conclusion : Forwarded records are a type of fragmentation within heap tables that can cause poor SQL Server performance.

In a heap it is possible to get  forwarded records. They occur when a record in a heap expands such that it no longer fits on the page it currently resides on. In this case, the record is moved to a new page, and a small forwarding record is left in the original location. The forwarding record points to the new location of the record, which is known as a forwarded record. This is done as a performance optimization so that all the non clustered indexes on the heap do not have to be altered with the new location of the heap record.

When a larger value is updated in a variable length field in a record, SQL Server will first try to expand the row of the existing page on the chance that there’s enough room to continue to use the original page. If that fails and SQL Server can’t find an existing page with enough room for the larger value, a new page has to be created for the record. The data is moved to the new page and is assigned a new Relative Identifier (RID). The old page now contains a forwarding pointer (forwarded record) that tells SQL Server to jump to the new RID. The new record location also has a back pointer to the old record. 

Forwarded records prevent non-clustered indices on heap tables from being updated with the RID of the new row. The overhead of updating indices is worse than replacing the old records with a pointer. But the drawback to forwarded records is that their presence could result in a large amount of I/O. For each forwarded record, SQL Server has to jump to the target page, then back to the original one. This is a very inefficient way to read a row, and it also wastes space.

There can be non-clustered indexes in the Heap. So in case a record has to move to new page, at the old location of the row it will leave a forwarding pointer. This mechanism is to stop the updating of non-clustered indexes as it can still point to the old location. So the modification is less costly than if all non-clustered indexes needed to reflect the row move.

Let’s now stimulate a condition where we will be getting the forwarding records, here we are creating table Forwarding_Records_Demo

create table Forwarding_Records_Demo 
(id bigint identity(1,1), 
name varchar(200), 
ordered_qty int, 
date_getdate datetime default getdate()  ) 

Command(s) completed successfully.

Here we are creating two non cluster index on date_getdate and ordered_qty

create nonclustered index idx_date_getdate on Forwarding_Records_Demo(date_getdate) 
create nonclustered index idx_ordered_qty on Forwarding_Records_Demo(ordered_qty) 

Command(s) completed successfully.

Now lets insert some records in newly created table

declare @i int =0 
while @i < 100 
begin    
      insert into Forwarding_Records_Demo(name,ordered_qty)    
      values  ( 'DemoNames for ' + cast(@i as varchar(10)),rand(1) * 100 )    
      set @i = @i + 1 
end

Let’s check the insertion

select COUNT(*) as No_of_Records from Forwarding_Records_Demo
No_of_Records
-------------
100


Let’s query the dm_db_index_physical_stats to check forwarded record count.

select
object_name(ps.object_id) as tablename,
i.name as indexname,     
ps.index_type_desc,     
ps.page_count,     
ps.avg_fragmentation_in_percent,     
ps.forwarded_record_count 
from
sys.dm_db_index_physical_stats (db_id(), null, null, null,'detailed') as ps 
inner join sys.indexes as i     
on ps.object_id = i.object_id and ps.index_id = i.index_id 
where object_name(ps.object_id) = 'Forwarding_Records_Demo'

indexname         index_type_desc page_count avg_fragmentation forwarded_
                                             in_percent        record_count
--------------------------------- ---------------------------- ------------
NULL              HEAP            1          0                 NULL
idx_date_getdate  NONCLUSTERED    1          0                 NULL
idx_ordered_qty   NONCLUSTERED    1          0                 NULL

(3 row(s) affected)

Now let’s alter the column size of name, and forces SQL Server to move those rows to another page

alter table Forwarding_Records_Demo
alter column name char(400);

Command(s) completed successfully.

Now since we have modified the column to 400 sizes, let’s re-run the same DMV query to check whether

select
object_name(ps.object_id) as tablename,
i.name as indexname,     
ps.index_type_desc,     
ps.page_count,     
ps.avg_fragmentation_in_percent,     
ps.forwarded_record_count 
from
sys.dm_db_index_physical_stats (db_id(), null, null, null,'detailed') as ps 
inner join sys.indexes as i     
on ps.object_id = i.object_id and ps.index_id = i.index_id 
where object_name(ps.object_id) = 'Forwarding_Records_Demo'

indexname         index_type_desc page_count avg_fragmentation forwarded_
                                              _in_percent      record_count
--------------------------------- ---------------------------- ------------
NULL              HEAP            6          50                85
idx_date_getdate  NONCLUSTERED    1          0                 NULL
idx_ordered_qty   NONCLUSTERED    1          0                 NULL

(3 row(s) affected)

Now we have 85 forwarded records.

Forwarding pointers are common performance problem in Heaps. SQL Server 2008 introduced ALTER TABLE ... REBUILD which removes all forwarding pointers and performs a rebuild

alter table Forwarding_Records_Demo rebuild
Command(s) completed successfully.

Now let us re-examine the forwarded recount again which will be back to zero by running the above DMV query.

indexname         index_type_desc page_count avg_fragmentation forwarded_
                                             in_percent        record_count
--------------------------------- ---------------------------- ------------
NULL              HEAP            7          0                 NULL
idx_date_getdate  NONCLUSTERED    1          0                 NULL
idx_ordered_qty   NONCLUSTERED    1          0                 NULL

(3 row(s) affected)

Now we have no forwarding records.

Every ash of mine will fabricate a novel code which will lessen up the world of programmer hitches         


Post Reference: Vikram Aristocratic Elfin Share

Thursday, July 4, 2013

Merge multiple rows into Single rows


I was asked a very good question from my friend, how can one merge multiple rows into single rows with comma separated output.

Thought entertaining, this means breaking the rule of Normalization 1NF, still if there is business need one has to compromise with it.

There I device two way to achieve the solution for it. Before we discuss solution, let’s first try to stimulate the same problem.

For our example I am taking two table say, deptTest and emptest , lets create these two tables.

create table deptTest
(dept_id int,
dept_name varchar(10))

Command(s) completed successfully.

create table empTest
(emp_id int,
name varchar(10),
dept_id int);

Command(s) completed successfully.

 Now let’s insert some records in both the table.

insert into deptTest values(1,'Software'),(2,'Operation')
(2 row(s) affected)

insert into empTest values(1,'Snehal',1),(2,'Prachi',1),(3,'Bratiti',2)
(3 row(s) affected)

Lets query both the table

select * from deptTest

dept_id     dept_name
----------- ----------
1           Software
2           Operation

(2 row(s) affected)

select * from empTest

emp_id      name       dept_id
----------- ---------- -----------
1           Snehal     1
2           Prachi     1
3           Bratiti    2

(3 row(s) affected)

Problem Statement: we need to find all employees in comma separated values for each department. Like this
name
----------------
Bratiti
Snehal,Prachi

First Method: with XML and Path
Here in the below script, it is performing co related query, for each department in the outer query, the inner query is trying to find employees for the department.

The inner query result is then converted into XML and with the help of data()we are formating values into normal text.
  
select distinct t1.dept_name, name =
replace((SELECT t2.name AS [data()] FROM empTest t2 where t2.dept_id = t1.dept_id FOR XML PATH('')), ' ', ',')
from deptTest t1

Second Method:
Here we are creating a function where we are appending each record in a particular string and then returning the string to the calling function.

create function fn_Merge(@id int)
returns varchar(20)
as
begin
      declare @name1 varchar(20)
      set @name1=''
      select @name1 = @name1 + ',' + t.name
      from empTest t where t.dept_id = @id
      return substring(@name1,2,len(@name1))
end

select dbo.fn_Merge(d.dept_id) as Name from deptTest d

Name
--------------------
Snehal,Prachi
Bratiti

(2 row(s) affected)

For a programmer, the surest lane to wisdom is the noninterventionist coding. Let him do code only.   


Post Reference: Vikram Aristocratic Elfin Share