Aristocratic Elfin Share: July 2013

Sunday, July 14, 2013

Identify the Key Index Maintenance Tasks

There are many which can be performed as a Maintenance task for Index Maintenance, but here we will see some high end task at least which should be included in your Index Maintenance task.

Index Fragmentation, Identify and remove index fragmentation
Outdated Index, Identify skewed and outdated index and column statistics and ensure they are representative and up-to-date
Missing Indexes, Identify and create missing indexes
Remove Unused Indexes, Identify and remove unused indexes
Index Maintenance Job, Creating and monitoring index maintenance jobs

Let’s take a brief look at each one of these, one at a time.

Remove Index Fragmentation

The easiest way to remove index fragmentation is to defragment all of your database’s indexes every day. If you have a small database, and you have defined maintenance times, this might be a viable option. On the other hand, many databases are relatively large and must be available 24x7 throughout the day. In these cases, it is often not practical to defragment every index every day. Instead, you want to have the ability to identify only those indexes that really need to be defragmented, and focus on those.

Skewed and Outdated Index and Column Statistics and Ensure they are and Up to Date

Index statistics are used by the query optimizer to help in determine when and which index should be used hen executing a query. If these statistics are stale, or out-of-date, then the query optimizer might not make the best choice when selecting the indexes used to perform the execution of query.

Identify and Create Missing Indexes

User can make use Index tuning wizard to know the missing Indexes. Else he can look out each query to analyze the execution plan to find the missing Index statistics, once you have indentified the missing index, your task involve in creating those indexes. Take the advantage of tool that can help you identify many missing indexes with minimal effort? While I refer to this process as automation, it is not really 100% automated. Some of the steps of this maintenance task still should be done by hand, and DBAs also need to make a judgment about the indexes the Database Engine Tuning Wizard recommends before you blindly let it create new indexes for your databases.

Identify and Remove Unused Indexes

What’s the point of using up resources modifying indexes that never will be used? Unused indexes should be removed from your databases. SQL Server 2005 and SQL Server 2008 make it easy to identify unused indexes in a database using DMVs. Essentially, these DMVs track every time an index is used, which means you can easily determine which indexes can be safely removed. While in theory you could entirely automate this maintenance task, I prefer to do it manually because there are often some circumstances that automation can’t easily take into account.

Creating and Monitoring Index Maintenance Jobs

This is the step where you create the scripts necessary to perform automated index maintenance, and then schedule them with the SQL Server agent. In addition, you will want configure your SQL Server to notify you if any of these jobs fail, so you can ensure that index maintenance is performed as you expect.

In coming articles we will pick each Index Maintenance task and go deeper to each Indexes problem n there by resolving the issues concern with Indexes.

Why coders take pleasure in stupidity, because they are highly intelligent alien race

Post Reference: Vikram Aristocratic Elfin Share

Saturday, July 13, 2013

If I drop a Clustered Index, what will go off, and what will happen to Non Clustered Indexes?

If you drop the cluster index on the table, the table data will not gets deleted instead the offset table of the cluster index for maintaining the order of table data will gets removed from page, however since indexes use the clustering key to look up the corresponding row of data, so when a clustered index is dropped, the nonclustered indexes must be modified to use another method to lookup the corresponding data row because the clustering key no longer exists.

No for non cluster index the only way to jump directly to a record in the table without a clustered index is to use its physical location in the database (i.e., a particular record number on a particular data page in a particular data file, known as a row identifier—RID), and this physical location must be included in the nonclustered indexes now that the table is no longer clustered. So when a clustered index is dropped, all the nonclustered indexes must be rebuilt to use RIDs to look up the corresponding row within the heap.

Enjoy madness, if you are doing code, upshot will be chill

Post Reference: Vikram Aristocratic Elfin Share

Sunday, July 7, 2013

Selecting Top N Records Group wise

Often we require to query to find top among subset, for example if you have a table called OderPlaced where you want the top two order for each product. There are couple of way we can actually do this, let’s check out

Problem Statement: To find the top 2 orders for each productid based upon the earliest orderid.

Solution: Let’s create a OrderPlace table with two column OderId and ProductId

create table OrderPlace

(OrderId int,

ProductId int)

Command(s) completed successfully.

Lets insert few records in OrderPlace table

insert into OrderPlace (OrderId, ProductId) values (82, 707)

insert into OrderPlace (OrderId, ProductId) values (83, 707)

insert into OrderPlace (OrderId, ProductId) values (84, 707)

insert into OrderPlace (OrderId, ProductId) values (82, 708)

insert into OrderPlace (OrderId, ProductId) values (83, 708)

insert into OrderPlace (OrderId, ProductId) values (84, 708)

insert into OrderPlace (OrderId, ProductId) values (82, 711)

insert into OrderPlace (OrderId, ProductId) values (83, 711)

insert into OrderPlace (OrderId, ProductId) values (84, 711)

insert into OrderPlace (OrderId, ProductId) values (82, 712)

insert into OrderPlace (OrderId, ProductId) values (83, 712)

insert into OrderPlace (OrderId, ProductId) values (84, 712)

insert into OrderPlace (OrderId, ProductId) values (82, 714)

insert into OrderPlace (OrderId, ProductId) values (83, 714)

insert into OrderPlace (OrderId, ProductId) values (84, 714)

insert into OrderPlace (OrderId, ProductId) values (82, 715)

insert into OrderPlace (OrderId, ProductId) values (83, 715)

insert into OrderPlace (OrderId, ProductId) values (84, 715)

insert into OrderPlace (OrderId, ProductId) values (83, 716)

insert into OrderPlace (OrderId, ProductId) values (84, 716)

Now since we have records in place, lets design the query to find top 2 orders for each productid based upon the earliest ordered.

select OrderId,ProductId from

(select op.OrderId,op.ProductId, rowid = ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY OrderId)

from OrderPlace op) t_op

where rowid < =2

OrderId ProductId

----------- -----------

82 707

83 707

82 708

83 708

82 711

83 711

82 712

83 712

82 714

83 714

82 715

83 715

83 716

84 716

(14 row(s) affected)

Here we gave row number by grouping ProductId then in outer query we just fetch those records which has row number >=2.

Programming is a mad work performed by genius minds

Advantage of having a non clustered index same as a clustered index on a table

I found this as a question in one of the forum and thought of coming up with an article with the same. His question was like this

Problem Statement: Is there any advantage of having a non clustered index same as a clustered index on a table?.

Here is a little background to his question question: In our current data warehouse framework (sql server version - 2005), In all of the tables we have one unique non clustered index same as a clustered index. So I was wondering if there is any advantage of doing that.

Solution: The only advantage which I could figure out is that the entries on leaf pages of nonclustered index are not as wide. They only contain index columns while the clustered index' leaf pages are the actual rows of data. Therefore, if you need something like select count (column_name) or count (*) from your_table then scanning the nonclustered index will involve considerably smaller number of data pages.

To prove our statement lets create a scenario, here we are creating table DupIndexDemo.

create table DupIndexDemo

(id int identity(1,1),

name varchar(200),

ordered_qty int,

date_getdate datetime default getdate() )

Command(s) completed successfully.

Let’s now create index on ID, one is cluster and another we are creating noncluster index on the same column id.

create clustered index idx_nc_id on DupIndexDemo(id)

create nonclustered index idx_c_id on DupIndexDemo(ordered_qty)

Command(s) completed successfully.

Now let’s insert few records in DupIndexDemo.

declare @i int =0

while @i < 10000

begin

insert into DupIndexDemo(name,ordered_qty)

values ( 'DemoNames for ' + cast(@i as varchar(10)),rand(1) * 100 )

set @i = @i + 1

end

Let’s fire count (*) query to check, which index it is picking, later we will justify why it picked non cluster index.

select COUNT(*) from DupIndexDemo

Here from the explain plan we can see it picked NonClustered Index idx_nc_id and by extending the nonCluster operator we can see the IO cost involved is 0.0157176 and it cost 82% of the total cost to find the count of records.

Lets now lets drop NonCluster Index and fire the count (*) query.

drop index idx_nc_id on DupIndexDemo

Command(s) completed successfully

Now lets run the count(*) query and check the cost measure.

select COUNT(*) from DupIndexDemo

Here we can see cluster Index is taking 91% of the total cost involved in the query and the IO cost involved is 0.0475694 which is around four times compare to NonCluster Index.

Thus here we can say NonCluster index on the same column is giving performance optimized. But still I will not prefer to have cluster and NonCluster on the same column since it leads to duplicate Indexes which is considered as bad practice.

Whereas if you remove both the indexes then it will do table scan.

My girl friend always labels me Idiot; if programmers are idiots then Program Managers must be Idiot boxes ;)

Saturday, July 6, 2013

Select count(*) Performance analysis with NonCluster Index

There is a common perception that count(*) is not a optimized way of getting the number of records in a table. In this article we will be examining the same and finally we will conclude with our finding.

For analysis lets take the sales.Store table of AdventureWork database, lets see first how many indexes are there in place for Sales.Store table.

select OBJECT_NAME(object_id),name,index_id,type_desc from sys.indexes where object_id= OBJECT_ID('Sales.Store')

name index_id type_desc

------------------------------------------- -------------

Store PK_Store_BusinessEntityID 1 CLUSTERED

Store AK_Store_rowguid 2 NONCLUSTERED

Store IX_Store_SalesPersonID 3 NONCLUSTERED

Store PXML_Store_Demographics 256000 XML

(4 row(s) affected)

Here we can see there are total of four index available on Sales.Store table.

Let’s now fire DBCC IND command for each index to find the number of page these indexes are span.

--Index id 1. PK_Store_BusinessEntityID

DBCC IND ('Adventureworks2008', '[Sales].[Store]', 1);

Page Count : 103

--Index Id 2. AK_Store_rowguid

DBCC IND ('Adventureworks2008', '[Sales].[Store]', 2);

Page Count : 4

--Index Id 3. IX_Store_SalesPersonID

DBCC IND ('Adventureworks2008', '[Sales].[Store]', 3);

Page Count : 4

--Index Id 256000. PXML_Store_Demographics

DBCC IND ('Adventureworks2008', '[Sales].[Store]', 256000);

Page Count : 66

Since IX_Store_SalesPersonID has lowest pagecount, smallest index among all, if it gets picked up for finding total row count, it would be optimized one.

Let see which index count(*) will pick for getting the total number of rows in Sales.Store table.

select COUNT(*) from Sales.Store

So here we saw for getting count(*) optimizer picked IX_Store_SalesPersonID index.

Conclusion: Thus, when using COUNT(*) verify whether correct indexes are being used and do not believe that COUNT(*) will use Clustered index.

If you punish a programmer with code he will reach to dreamland, dare to punish

Post Reference: Vikram Aristocratic Elfin Share

Forwarding Records, Forwarding Pointers Masked Performance Killer for NonCluster Index

Tool Used : sys.dm_db_index_physical_stats DMV

Conclusion : Forwarded records are a type of fragmentation within heap tables that can cause poor SQL Server performance.

In a heap it is possible to get forwarded records. They occur when a record in a heap expands such that it no longer fits on the page it currently resides on. In this case, the record is moved to a new page, and a small forwarding record is left in the original location. The forwarding record points to the new location of the record, which is known as a forwarded record. This is done as a performance optimization so that all the non clustered indexes on the heap do not have to be altered with the new location of the heap record.

When a larger value is updated in a variable length field in a record, SQL Server will first try to expand the row of the existing page on the chance that there’s enough room to continue to use the original page. If that fails and SQL Server can’t find an existing page with enough room for the larger value, a new page has to be created for the record. The data is moved to the new page and is assigned a new Relative Identifier (RID). The old page now contains a forwarding pointer (forwarded record) that tells SQL Server to jump to the new RID. The new record location also has a back pointer to the old record.

Forwarded records prevent non-clustered indices on heap tables from being updated with the RID of the new row. The overhead of updating indices is worse than replacing the old records with a pointer. But the drawback to forwarded records is that their presence could result in a large amount of I/O. For each forwarded record, SQL Server has to jump to the target page, then back to the original one. This is a very inefficient way to read a row, and it also wastes space.

There can be non-clustered indexes in the Heap. So in case a record has to move to new page, at the old location of the row it will leave a forwarding pointer. This mechanism is to stop the updating of non-clustered indexes as it can still point to the old location. So the modification is less costly than if all non-clustered indexes needed to reflect the row move.

Let’s now stimulate a condition where we will be getting the forwarding records, here we are creating table Forwarding_Records_Demo

create table Forwarding_Records_Demo

(id bigint identity(1,1),

name varchar(200),

ordered_qty int,

date_getdate datetime default getdate() )

Command(s) completed successfully.

Here we are creating two non cluster index on date_getdate and ordered_qty

create nonclustered index idx_date_getdate on Forwarding_Records_Demo(date_getdate)

create nonclustered index idx_ordered_qty on Forwarding_Records_Demo(ordered_qty)

Command(s) completed successfully.

Now lets insert some records in newly created table

declare @i int =0

while @i < 100

begin

insert into Forwarding_Records_Demo(name,ordered_qty)

values ( 'DemoNames for ' + cast(@i as varchar(10)),rand(1) * 100 )

set @i = @i + 1

end

Let’s check the insertion

select COUNT(*) as No_of_Records from Forwarding_Records_Demo

No_of_Records

-------------

100

Let’s query the dm_db_index_physical_stats to check forwarded record count.

select

object_name(ps.object_id) as tablename,

i.name as indexname,

ps.index_type_desc,

ps.page_count,

ps.avg_fragmentation_in_percent,

ps.forwarded_record_count

from

sys.dm_db_index_physical_stats (db_id(), null, null, null,'detailed') as ps

inner join sys.indexes as i

on ps.object_id = i.object_id and ps.index_id = i.index_id

where object_name(ps.object_id) = 'Forwarding_Records_Demo'

indexname index_type_desc page_count avg_fragmentation forwarded_

in_percent record_count

--------------------------------- ---------------------------- ------------

NULL HEAP 1 0 NULL

idx_date_getdate NONCLUSTERED 1 0 NULL

idx_ordered_qty NONCLUSTERED 1 0 NULL

(3 row(s) affected)

Now let’s alter the column size of name, and forces SQL Server to move those rows to another page

alter table Forwarding_Records_Demo

alter column name char(400);

Command(s) completed successfully.

Now since we have modified the column to 400 sizes, let’s re-run the same DMV query to check whether

select

object_name(ps.object_id) as tablename,

i.name as indexname,

ps.index_type_desc,

ps.page_count,

ps.avg_fragmentation_in_percent,

ps.forwarded_record_count

from

sys.dm_db_index_physical_stats (db_id(), null, null, null,'detailed') as ps

inner join sys.indexes as i

on ps.object_id = i.object_id and ps.index_id = i.index_id

where object_name(ps.object_id) = 'Forwarding_Records_Demo'

indexname index_type_desc page_count avg_fragmentation forwarded_

_in_percent record_count

--------------------------------- ---------------------------- ------------

NULL HEAP 6 50 85

idx_date_getdate NONCLUSTERED 1 0 NULL

idx_ordered_qty NONCLUSTERED 1 0 NULL

(3 row(s) affected)

Now we have 85 forwarded records.

Forwarding pointers are common performance problem in Heaps. SQL Server 2008 introduced ALTER TABLE ... REBUILD which removes all forwarding pointers and performs a rebuild

alter table Forwarding_Records_Demo rebuild

Command(s) completed successfully.

Now let us re-examine the forwarded recount again which will be back to zero by running the above DMV query.

indexname index_type_desc page_count avg_fragmentation forwarded_

in_percent record_count

--------------------------------- ---------------------------- ------------

NULL HEAP 7 0 NULL

idx_date_getdate NONCLUSTERED 1 0 NULL

idx_ordered_qty NONCLUSTERED 1 0 NULL

(3 row(s) affected)

Now we have no forwarding records.

Every ash of mine will fabricate a novel code which will lessen up the world of programmer hitches

Post Reference: Vikram Aristocratic Elfin Share

Aristocratic Elfin Share

Pages

About Me

Search This Blog

Sunday, July 14, 2013

Identify the Key Index Maintenance Tasks

Saturday, July 13, 2013

If I drop a Clustered Index, what will go off, and what will happen to Non Clustered Indexes?

Sunday, July 7, 2013

Selecting Top N Records Group wise

Advantage of having a non clustered index same as a clustered index on a table

Saturday, July 6, 2013

Select count(*) Performance analysis with NonCluster Index

Forwarding Records, Forwarding Pointers Masked Performance Killer for NonCluster Index