About Me

My photo
Mumbai, Maharastra, India
He has more than 7.6 years of experience in the software development. He has spent most of the times in web/desktop application development. He has sound knowledge in various database concepts. You can reach him at viki.keshari@gmail.com https://www.linkedin.com/in/vikrammahapatra/ https://twitter.com/VikramMahapatra http://www.facebook.com/viki.keshari

Search This Blog

Sunday, July 14, 2013

Identify the Key Index Maintenance Tasks


There are many which can be performed as a Maintenance task for Index Maintenance, but here we will see some high end task at least which should be included in your Index Maintenance task.

  1. Index Fragmentation, Identify and remove index fragmentation
  2. Outdated Index, Identify skewed and outdated index and column statistics and ensure they are representative and up-to-date
  3. Missing Indexes, Identify and create missing indexes
  4. Remove Unused Indexes, Identify and remove unused indexes
  5. Index Maintenance Job, Creating and monitoring index maintenance jobs


Let’s take a brief look at each one of these, one at a time.

Remove Index Fragmentation

The easiest way to remove index fragmentation is to defragment all of your database’s indexes every day. If you have a small database, and you have defined maintenance times, this might be a viable option. On the other hand, many databases are relatively large and must be available 24x7 throughout the day. In these cases, it is often not practical to defragment every index every day. Instead, you want to have the ability to identify only those indexes that really need to be defragmented, and focus on those.

Skewed and Outdated Index and Column Statistics and Ensure they are and Up to Date

Index statistics are used by the query optimizer to help in determine when and which index should be used hen executing a query. If these statistics are stale, or out-of-date, then the query optimizer might not make the best choice when selecting the indexes used to perform the execution of query.

Identify and Create Missing Indexes

User can make use Index tuning wizard to know the missing Indexes. Else he can look out each query to analyze the execution plan to find the missing Index statistics, once you have indentified the missing index, your task involve in creating those indexes.  Take the advantage of tool that can help you identify many missing indexes with minimal effort? While I refer to this process as automation, it is not really 100% automated. Some of the steps of this maintenance task still should be done by hand, and DBAs also need to make a judgment about the indexes the Database Engine Tuning Wizard recommends before you blindly let it create new indexes for your databases.


Identify and Remove Unused Indexes

What’s the point of using up resources modifying indexes that never will be used? Unused indexes should be removed from your databases. SQL Server 2005 and SQL Server 2008 make it easy to identify unused indexes in a database using DMVs. Essentially, these DMVs track every time an index is used, which means you can easily determine which indexes can be safely removed. While in theory you could entirely automate this maintenance task, I prefer to do it manually because there are often some circumstances that automation can’t easily take into account.

Creating and Monitoring Index Maintenance Jobs

This is the step where you create the scripts necessary to perform automated index maintenance, and then schedule them with the SQL Server agent. In addition, you will want configure your SQL Server to notify you if any of these jobs fail, so you can ensure that index maintenance is performed as you expect.

In coming articles we will pick each Index Maintenance task and go deeper to each Indexes problem n there by resolving the issues concern with Indexes.

Why coders take pleasure in stupidity, because they are highly intelligent alien race 


Post Reference: Vikram Aristocratic Elfin Share

Saturday, July 13, 2013

If I drop a Clustered Index, what will go off, and what will happen to Non Clustered Indexes?


If you drop the cluster index on the table, the table data will not gets deleted instead the offset table of the cluster index for maintaining the order of table data will gets removed from page, however since indexes use the clustering key to look up the corresponding row of data, so when a clustered index is dropped, the nonclustered indexes must be modified to use another method to lookup the corresponding data row because the clustering key no longer exists.

No for non cluster index the only way to jump directly to a record in the table without a clustered index is to use its physical location in the database (i.e., a particular record number on a particular data page in a particular data file, known as a row identifier—RID), and this physical location must be included in the nonclustered indexes now that the table is no longer clustered. So when a clustered index is dropped, all the nonclustered indexes must be rebuilt to use RIDs to look up the corresponding row within the heap.

Enjoy madness, if you are doing code, upshot will be chill


Post Reference: Vikram Aristocratic Elfin Share

Sunday, July 7, 2013

Selecting Top N Records Group wise


Often we require to query to find top among subset, for example if you have a table called OderPlaced where you want the top two order for each product. There are couple of way we can actually do this, let’s check out

Problem Statement:  To find the top 2 orders for each productid based upon the earliest orderid.

Solution: Let’s create a OrderPlace table with two column OderId and ProductId
create table OrderPlace
(OrderId int,
ProductId int)
Command(s) completed successfully.
Lets insert few records in OrderPlace table

insert into OrderPlace (OrderId, ProductId) values (82, 707)
insert into OrderPlace (OrderId, ProductId) values (83, 707)
insert into OrderPlace (OrderId, ProductId) values (84, 707)
insert into OrderPlace (OrderId, ProductId) values (82, 708)
insert into OrderPlace (OrderId, ProductId) values (83, 708)
insert into OrderPlace (OrderId, ProductId) values (84, 708)
insert into OrderPlace (OrderId, ProductId) values (82, 711)
insert into OrderPlace (OrderId, ProductId) values (83, 711)
insert into OrderPlace (OrderId, ProductId) values (84, 711)
insert into OrderPlace (OrderId, ProductId) values (82, 712)
insert into OrderPlace (OrderId, ProductId) values (83, 712)
insert into OrderPlace (OrderId, ProductId) values (84, 712)
insert into OrderPlace (OrderId, ProductId) values (82, 714)
insert into OrderPlace (OrderId, ProductId) values (83, 714)
insert into OrderPlace (OrderId, ProductId) values (84, 714)
insert into OrderPlace (OrderId, ProductId) values (82, 715)
insert into OrderPlace (OrderId, ProductId) values (83, 715)
insert into OrderPlace (OrderId, ProductId) values (84, 715)
insert into OrderPlace (OrderId, ProductId) values (83, 716)
insert into OrderPlace (OrderId, ProductId) values (84, 716)

Now since we have records in place, lets design the query to find top 2 orders for each productid based upon the earliest ordered.

select OrderId,ProductId from
(select op.OrderId,op.ProductId, rowid = ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY OrderId)
from OrderPlace op) t_op
where rowid < =2

OrderId     ProductId
----------- -----------
82          707
83          707
82          708
83          708
82          711
83          711
82          712
83          712
82          714
83          714
82          715
83          715
83          716
84          716

(14 row(s) affected)

Here we gave row number by grouping ProductId then in outer query we just fetch those records which has row number >=2.

Programming is a mad work performed by genius minds 

Advantage of having a non clustered index same as a clustered index on a table


I found this as a question in one of the forum and thought of coming up with an article with the same. His question was like this

Problem Statement: Is there any advantage of having a non clustered index same as a clustered index on a table?.
Here is a little background to his question question: In our current data warehouse framework (sql server version - 2005), In all of the tables we have one unique non clustered index same as a clustered index. So I was wondering if there is any advantage of doing that.
Solution: The only advantage which I could figure out is that the entries on leaf pages of nonclustered index are not as wide. They only contain index columns while the clustered index' leaf pages are the actual rows of data. Therefore, if you need something like select count (column_name) or count (*) from your_table then scanning the nonclustered index will involve considerably smaller number of data pages.
To prove our statement lets create a scenario, here we are creating table DupIndexDemo.

create table DupIndexDemo 
(id int identity(1,1), 
name varchar(200), 
ordered_qty int, 
date_getdate datetime default getdate()  )

Command(s) completed successfully.
  
Let’s now create index on ID, one is cluster and another we are creating noncluster index on the same column id.

create clustered index idx_nc_id on DupIndexDemo(id) 
create nonclustered index idx_c_id on DupIndexDemo(ordered_qty)  

Command(s) completed successfully.

Now let’s insert few records in DupIndexDemo.

declare @i int =0 
while @i < 10000 
begin    
      insert into DupIndexDemo(name,ordered_qty)    
      values  ( 'DemoNames for ' + cast(@i as varchar(10)),rand(1) * 100 )    
      set @i = @i + 1 
end

Let’s fire count (*) query to check, which index it is picking, later we will justify why it picked non cluster index.

select COUNT(*) from DupIndexDemo



















Here from the explain plan we can see it picked NonClustered Index idx_nc_id and by extending the nonCluster operator we can see the IO cost involved is 0.0157176 and it cost 82% of the total cost to find the count of records.

Lets now lets drop NonCluster Index and fire the count (*) query.

drop index idx_nc_id on DupIndexDemo
Command(s) completed successfully

Now lets run the count(*) query and check the cost measure.

select COUNT(*) from DupIndexDemo

















Here we can see cluster Index is taking 91% of the total cost involved in the query and the IO cost involved is 0.0475694 which is around four times compare to NonCluster Index.

Thus here we can say NonCluster index on the same column is giving performance optimized. But still I will not prefer to have cluster and NonCluster on the same column since it leads to duplicate Indexes which is considered as bad practice.
   
Whereas if you remove both the indexes then it will do table scan.

My girl friend always labels me Idiot; if programmers are idiots then Program Managers must be Idiot boxes ;)  


Saturday, July 6, 2013

Select count(*) Performance analysis with NonCluster Index


There is a common perception that count(*) is not a optimized way of getting the number of records in a table. In this article we will be examining the same and finally we will conclude with our finding.

For analysis lets take the sales.Store table of AdventureWork database, lets see first how many indexes are there in place for Sales.Store table.

select OBJECT_NAME(object_id),name,index_id,type_desc from sys.indexes where object_id= OBJECT_ID('Sales.Store')

      name                      index_id    type_desc
------------------------------------------- -------------
Store PK_Store_BusinessEntityID 1           CLUSTERED
Store AK_Store_rowguid          2           NONCLUSTERED
Store IX_Store_SalesPersonID    3           NONCLUSTERED
Store PXML_Store_Demographics   256000      XML

(4 row(s) affected)

 Here we can see there are total of four index available on Sales.Store table.

Let’s now fire DBCC IND command for each index to find the number of page these indexes are span.

--Index id 1. PK_Store_BusinessEntityID
DBCC IND ('Adventureworks2008', '[Sales].[Store]', 1);
  Page Count : 103
 
--Index Id 2. AK_Store_rowguid
DBCC IND ('Adventureworks2008', '[Sales].[Store]', 2);
  Page Count : 4 
 
--Index Id 3. IX_Store_SalesPersonID
DBCC IND ('Adventureworks2008', '[Sales].[Store]', 3);
  Page Count : 4 
 
--Index Id 256000. PXML_Store_Demographics
DBCC IND ('Adventureworks2008', '[Sales].[Store]', 256000);
  Page Count : 66 

Since IX_Store_SalesPersonID has lowest pagecount, smallest index among all, if it gets picked up for finding total row count, it would be optimized one.

Let see which index count(*) will pick for getting the total number of rows in Sales.Store table.

select COUNT(*) from Sales.Store








So here we saw for getting count(*) optimizer picked IX_Store_SalesPersonID index.

Conclusion: Thus, when using COUNT(*) verify whether correct indexes are being used and do not believe that COUNT(*) will use Clustered index.   

If you punish a programmer with code he will reach to dreamland, dare to punish


Post Reference: Vikram Aristocratic Elfin Share

Forwarding Records, Forwarding Pointers Masked Performance Killer for NonCluster Index


Tool Used : sys.dm_db_index_physical_stats DMV
Conclusion : Forwarded records are a type of fragmentation within heap tables that can cause poor SQL Server performance.

In a heap it is possible to get  forwarded records. They occur when a record in a heap expands such that it no longer fits on the page it currently resides on. In this case, the record is moved to a new page, and a small forwarding record is left in the original location. The forwarding record points to the new location of the record, which is known as a forwarded record. This is done as a performance optimization so that all the non clustered indexes on the heap do not have to be altered with the new location of the heap record.

When a larger value is updated in a variable length field in a record, SQL Server will first try to expand the row of the existing page on the chance that there’s enough room to continue to use the original page. If that fails and SQL Server can’t find an existing page with enough room for the larger value, a new page has to be created for the record. The data is moved to the new page and is assigned a new Relative Identifier (RID). The old page now contains a forwarding pointer (forwarded record) that tells SQL Server to jump to the new RID. The new record location also has a back pointer to the old record. 

Forwarded records prevent non-clustered indices on heap tables from being updated with the RID of the new row. The overhead of updating indices is worse than replacing the old records with a pointer. But the drawback to forwarded records is that their presence could result in a large amount of I/O. For each forwarded record, SQL Server has to jump to the target page, then back to the original one. This is a very inefficient way to read a row, and it also wastes space.

There can be non-clustered indexes in the Heap. So in case a record has to move to new page, at the old location of the row it will leave a forwarding pointer. This mechanism is to stop the updating of non-clustered indexes as it can still point to the old location. So the modification is less costly than if all non-clustered indexes needed to reflect the row move.

Let’s now stimulate a condition where we will be getting the forwarding records, here we are creating table Forwarding_Records_Demo

create table Forwarding_Records_Demo 
(id bigint identity(1,1), 
name varchar(200), 
ordered_qty int, 
date_getdate datetime default getdate()  ) 

Command(s) completed successfully.

Here we are creating two non cluster index on date_getdate and ordered_qty

create nonclustered index idx_date_getdate on Forwarding_Records_Demo(date_getdate) 
create nonclustered index idx_ordered_qty on Forwarding_Records_Demo(ordered_qty) 

Command(s) completed successfully.

Now lets insert some records in newly created table

declare @i int =0 
while @i < 100 
begin    
      insert into Forwarding_Records_Demo(name,ordered_qty)    
      values  ( 'DemoNames for ' + cast(@i as varchar(10)),rand(1) * 100 )    
      set @i = @i + 1 
end

Let’s check the insertion

select COUNT(*) as No_of_Records from Forwarding_Records_Demo
No_of_Records
-------------
100


Let’s query the dm_db_index_physical_stats to check forwarded record count.

select
object_name(ps.object_id) as tablename,
i.name as indexname,     
ps.index_type_desc,     
ps.page_count,     
ps.avg_fragmentation_in_percent,     
ps.forwarded_record_count 
from
sys.dm_db_index_physical_stats (db_id(), null, null, null,'detailed') as ps 
inner join sys.indexes as i     
on ps.object_id = i.object_id and ps.index_id = i.index_id 
where object_name(ps.object_id) = 'Forwarding_Records_Demo'

indexname         index_type_desc page_count avg_fragmentation forwarded_
                                             in_percent        record_count
--------------------------------- ---------------------------- ------------
NULL              HEAP            1          0                 NULL
idx_date_getdate  NONCLUSTERED    1          0                 NULL
idx_ordered_qty   NONCLUSTERED    1          0                 NULL

(3 row(s) affected)

Now let’s alter the column size of name, and forces SQL Server to move those rows to another page

alter table Forwarding_Records_Demo
alter column name char(400);

Command(s) completed successfully.

Now since we have modified the column to 400 sizes, let’s re-run the same DMV query to check whether

select
object_name(ps.object_id) as tablename,
i.name as indexname,     
ps.index_type_desc,     
ps.page_count,     
ps.avg_fragmentation_in_percent,     
ps.forwarded_record_count 
from
sys.dm_db_index_physical_stats (db_id(), null, null, null,'detailed') as ps 
inner join sys.indexes as i     
on ps.object_id = i.object_id and ps.index_id = i.index_id 
where object_name(ps.object_id) = 'Forwarding_Records_Demo'

indexname         index_type_desc page_count avg_fragmentation forwarded_
                                              _in_percent      record_count
--------------------------------- ---------------------------- ------------
NULL              HEAP            6          50                85
idx_date_getdate  NONCLUSTERED    1          0                 NULL
idx_ordered_qty   NONCLUSTERED    1          0                 NULL

(3 row(s) affected)

Now we have 85 forwarded records.

Forwarding pointers are common performance problem in Heaps. SQL Server 2008 introduced ALTER TABLE ... REBUILD which removes all forwarding pointers and performs a rebuild

alter table Forwarding_Records_Demo rebuild
Command(s) completed successfully.

Now let us re-examine the forwarded recount again which will be back to zero by running the above DMV query.

indexname         index_type_desc page_count avg_fragmentation forwarded_
                                             in_percent        record_count
--------------------------------- ---------------------------- ------------
NULL              HEAP            7          0                 NULL
idx_date_getdate  NONCLUSTERED    1          0                 NULL
idx_ordered_qty   NONCLUSTERED    1          0                 NULL

(3 row(s) affected)

Now we have no forwarding records.

Every ash of mine will fabricate a novel code which will lessen up the world of programmer hitches         


Post Reference: Vikram Aristocratic Elfin Share