About Me

My photo
Mumbai, Maharastra, India
He has more than 7.6 years of experience in the software development. He has spent most of the times in web/desktop application development. He has sound knowledge in various database concepts. You can reach him at viki.keshari@gmail.com https://www.linkedin.com/in/vikrammahapatra/ https://twitter.com/VikramMahapatra http://www.facebook.com/viki.keshari

Search This Blog

Monday, October 14, 2013

Separating First Name and Last Name from Name


There are bargains when user want to separate first and second name from Name field generally I have seen this for creating email address for in-house projects.

So if your requirement is to separate name to first name and last name then you can easily implement it with the help of charIndex and substring function of SQL.

Let’s do a practical for the same

create table #demoSeprateName
(name varchar(20),
first_name varchar(10),
last_name varchar(10))

insert into #demoSeprateName(name) values('Neha Sharma')
insert into #demoSeprateName(name) values('Richa Sharma')

select substring(name,1,charindex(' ',name)) as 'First Name',
substring(name,charindex(' ',name) + 1,len(name)) as 'Last Name'
from #demoSeprateName

First Name           Last Name
-------------------- --------------------
Neha                 Sharma
Richa                Sharma

(2 row(s) affected)

So here we have used charIndex function, it will give the position of the specified character, and substring function with which give the subsring of a given string starting from position mention to length specified.  

Let’s update the table with first_name and last_name value

update #demoSeprateName
set first_name = substring(name,1,charindex(' ',name)),
last_name = substring(name,charindex(' ',name) + 1,len(name))

name                 first_name last_name
-------------------- ---------- ----------
Neha Sharma          Neha       Sharma
Richa Sharma         Richa      Sharma

(2 row(s) affected)

Now if you want to generate email address in the format of firstName.LastName@abc.com you can do this

select ltrim(rtrim(substring(name,1,charindex(' ',name))))
+ '.' +
ltrim(rtrim(substring(name,charindex(' ',name) + 1,len(name))))
+ '@xyz.com'
as 'Email Address'
from #demoSeprateName

Email Address
-------------------------------------------------
Neha.Sharma@xyz.com

Be serious about Coding, if you want to have any amusement in coding.


Post Reference: Vikram Aristocratic Elfin Share

Wednesday, August 7, 2013

Group by with Union


There is always a question for fresh SQL Engineers, how do I add group by with Union statement, doing this throw error.

It is quite possible and very easy to implement. Lets take an example to check this

Here we are creating two table TableA and TableB with two fileld id and name, then we will do the union with group by

create table TableA
(id int,
name varchar(15))
Command(s) completed successfully.

create table TableB
(id int,
name varchar(15))
Command(s) completed successfully.

Inserting few rows into TableA

insert into TableA values(1,'Prachi')
insert into TableA values(2,'Shilpa')
insert into TableA values(3,'Bhagyashree')
  
Inserting few rows into TableB

insert into TableB values(1,'Pragyan')
insert into TableB values(3,'Subankeri')
insert into TableB values(4,'Neha')

Now I want to do union of these two tables and group the result by ID

select id,name from TableA
union all
select id,name from TableB
group by id

Msg 8120, Level 16, State 1, Line 3
Column 'TableB.name' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

Let see what an alternative is

select * from
(select id,name from TableA
union all
select id,name from TableB)as FinalTable(id,name)
group by id,name

Conclusion:  we can use group by after putting the union query result in a select statement thereafter doing group by.

Programmer born alone as a warrior, live alone as a fighter  and die alone as a calm, sometime in the course of his programming life he create a false impression for the  moment that he is not unaided, he too has girl friend to love beyond code.  


Post Reference: Vikram Aristocratic Elfin Share

Sunday, July 14, 2013

Identify the Key Index Maintenance Tasks


There are many which can be performed as a Maintenance task for Index Maintenance, but here we will see some high end task at least which should be included in your Index Maintenance task.

  1. Index Fragmentation, Identify and remove index fragmentation
  2. Outdated Index, Identify skewed and outdated index and column statistics and ensure they are representative and up-to-date
  3. Missing Indexes, Identify and create missing indexes
  4. Remove Unused Indexes, Identify and remove unused indexes
  5. Index Maintenance Job, Creating and monitoring index maintenance jobs


Let’s take a brief look at each one of these, one at a time.

Remove Index Fragmentation

The easiest way to remove index fragmentation is to defragment all of your database’s indexes every day. If you have a small database, and you have defined maintenance times, this might be a viable option. On the other hand, many databases are relatively large and must be available 24x7 throughout the day. In these cases, it is often not practical to defragment every index every day. Instead, you want to have the ability to identify only those indexes that really need to be defragmented, and focus on those.

Skewed and Outdated Index and Column Statistics and Ensure they are and Up to Date

Index statistics are used by the query optimizer to help in determine when and which index should be used hen executing a query. If these statistics are stale, or out-of-date, then the query optimizer might not make the best choice when selecting the indexes used to perform the execution of query.

Identify and Create Missing Indexes

User can make use Index tuning wizard to know the missing Indexes. Else he can look out each query to analyze the execution plan to find the missing Index statistics, once you have indentified the missing index, your task involve in creating those indexes.  Take the advantage of tool that can help you identify many missing indexes with minimal effort? While I refer to this process as automation, it is not really 100% automated. Some of the steps of this maintenance task still should be done by hand, and DBAs also need to make a judgment about the indexes the Database Engine Tuning Wizard recommends before you blindly let it create new indexes for your databases.


Identify and Remove Unused Indexes

What’s the point of using up resources modifying indexes that never will be used? Unused indexes should be removed from your databases. SQL Server 2005 and SQL Server 2008 make it easy to identify unused indexes in a database using DMVs. Essentially, these DMVs track every time an index is used, which means you can easily determine which indexes can be safely removed. While in theory you could entirely automate this maintenance task, I prefer to do it manually because there are often some circumstances that automation can’t easily take into account.

Creating and Monitoring Index Maintenance Jobs

This is the step where you create the scripts necessary to perform automated index maintenance, and then schedule them with the SQL Server agent. In addition, you will want configure your SQL Server to notify you if any of these jobs fail, so you can ensure that index maintenance is performed as you expect.

In coming articles we will pick each Index Maintenance task and go deeper to each Indexes problem n there by resolving the issues concern with Indexes.

Why coders take pleasure in stupidity, because they are highly intelligent alien race 


Post Reference: Vikram Aristocratic Elfin Share

Saturday, July 13, 2013

If I drop a Clustered Index, what will go off, and what will happen to Non Clustered Indexes?


If you drop the cluster index on the table, the table data will not gets deleted instead the offset table of the cluster index for maintaining the order of table data will gets removed from page, however since indexes use the clustering key to look up the corresponding row of data, so when a clustered index is dropped, the nonclustered indexes must be modified to use another method to lookup the corresponding data row because the clustering key no longer exists.

No for non cluster index the only way to jump directly to a record in the table without a clustered index is to use its physical location in the database (i.e., a particular record number on a particular data page in a particular data file, known as a row identifier—RID), and this physical location must be included in the nonclustered indexes now that the table is no longer clustered. So when a clustered index is dropped, all the nonclustered indexes must be rebuilt to use RIDs to look up the corresponding row within the heap.

Enjoy madness, if you are doing code, upshot will be chill


Post Reference: Vikram Aristocratic Elfin Share

Sunday, July 7, 2013

Selecting Top N Records Group wise


Often we require to query to find top among subset, for example if you have a table called OderPlaced where you want the top two order for each product. There are couple of way we can actually do this, let’s check out

Problem Statement:  To find the top 2 orders for each productid based upon the earliest orderid.

Solution: Let’s create a OrderPlace table with two column OderId and ProductId
create table OrderPlace
(OrderId int,
ProductId int)
Command(s) completed successfully.
Lets insert few records in OrderPlace table

insert into OrderPlace (OrderId, ProductId) values (82, 707)
insert into OrderPlace (OrderId, ProductId) values (83, 707)
insert into OrderPlace (OrderId, ProductId) values (84, 707)
insert into OrderPlace (OrderId, ProductId) values (82, 708)
insert into OrderPlace (OrderId, ProductId) values (83, 708)
insert into OrderPlace (OrderId, ProductId) values (84, 708)
insert into OrderPlace (OrderId, ProductId) values (82, 711)
insert into OrderPlace (OrderId, ProductId) values (83, 711)
insert into OrderPlace (OrderId, ProductId) values (84, 711)
insert into OrderPlace (OrderId, ProductId) values (82, 712)
insert into OrderPlace (OrderId, ProductId) values (83, 712)
insert into OrderPlace (OrderId, ProductId) values (84, 712)
insert into OrderPlace (OrderId, ProductId) values (82, 714)
insert into OrderPlace (OrderId, ProductId) values (83, 714)
insert into OrderPlace (OrderId, ProductId) values (84, 714)
insert into OrderPlace (OrderId, ProductId) values (82, 715)
insert into OrderPlace (OrderId, ProductId) values (83, 715)
insert into OrderPlace (OrderId, ProductId) values (84, 715)
insert into OrderPlace (OrderId, ProductId) values (83, 716)
insert into OrderPlace (OrderId, ProductId) values (84, 716)

Now since we have records in place, lets design the query to find top 2 orders for each productid based upon the earliest ordered.

select OrderId,ProductId from
(select op.OrderId,op.ProductId, rowid = ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY OrderId)
from OrderPlace op) t_op
where rowid < =2

OrderId     ProductId
----------- -----------
82          707
83          707
82          708
83          708
82          711
83          711
82          712
83          712
82          714
83          714
82          715
83          715
83          716
84          716

(14 row(s) affected)

Here we gave row number by grouping ProductId then in outer query we just fetch those records which has row number >=2.

Programming is a mad work performed by genius minds 

Advantage of having a non clustered index same as a clustered index on a table


I found this as a question in one of the forum and thought of coming up with an article with the same. His question was like this

Problem Statement: Is there any advantage of having a non clustered index same as a clustered index on a table?.
Here is a little background to his question question: In our current data warehouse framework (sql server version - 2005), In all of the tables we have one unique non clustered index same as a clustered index. So I was wondering if there is any advantage of doing that.
Solution: The only advantage which I could figure out is that the entries on leaf pages of nonclustered index are not as wide. They only contain index columns while the clustered index' leaf pages are the actual rows of data. Therefore, if you need something like select count (column_name) or count (*) from your_table then scanning the nonclustered index will involve considerably smaller number of data pages.
To prove our statement lets create a scenario, here we are creating table DupIndexDemo.

create table DupIndexDemo 
(id int identity(1,1), 
name varchar(200), 
ordered_qty int, 
date_getdate datetime default getdate()  )

Command(s) completed successfully.
  
Let’s now create index on ID, one is cluster and another we are creating noncluster index on the same column id.

create clustered index idx_nc_id on DupIndexDemo(id) 
create nonclustered index idx_c_id on DupIndexDemo(ordered_qty)  

Command(s) completed successfully.

Now let’s insert few records in DupIndexDemo.

declare @i int =0 
while @i < 10000 
begin    
      insert into DupIndexDemo(name,ordered_qty)    
      values  ( 'DemoNames for ' + cast(@i as varchar(10)),rand(1) * 100 )    
      set @i = @i + 1 
end

Let’s fire count (*) query to check, which index it is picking, later we will justify why it picked non cluster index.

select COUNT(*) from DupIndexDemo



















Here from the explain plan we can see it picked NonClustered Index idx_nc_id and by extending the nonCluster operator we can see the IO cost involved is 0.0157176 and it cost 82% of the total cost to find the count of records.

Lets now lets drop NonCluster Index and fire the count (*) query.

drop index idx_nc_id on DupIndexDemo
Command(s) completed successfully

Now lets run the count(*) query and check the cost measure.

select COUNT(*) from DupIndexDemo

















Here we can see cluster Index is taking 91% of the total cost involved in the query and the IO cost involved is 0.0475694 which is around four times compare to NonCluster Index.

Thus here we can say NonCluster index on the same column is giving performance optimized. But still I will not prefer to have cluster and NonCluster on the same column since it leads to duplicate Indexes which is considered as bad practice.
   
Whereas if you remove both the indexes then it will do table scan.

My girl friend always labels me Idiot; if programmers are idiots then Program Managers must be Idiot boxes ;)  


Saturday, July 6, 2013

Select count(*) Performance analysis with NonCluster Index


There is a common perception that count(*) is not a optimized way of getting the number of records in a table. In this article we will be examining the same and finally we will conclude with our finding.

For analysis lets take the sales.Store table of AdventureWork database, lets see first how many indexes are there in place for Sales.Store table.

select OBJECT_NAME(object_id),name,index_id,type_desc from sys.indexes where object_id= OBJECT_ID('Sales.Store')

      name                      index_id    type_desc
------------------------------------------- -------------
Store PK_Store_BusinessEntityID 1           CLUSTERED
Store AK_Store_rowguid          2           NONCLUSTERED
Store IX_Store_SalesPersonID    3           NONCLUSTERED
Store PXML_Store_Demographics   256000      XML

(4 row(s) affected)

 Here we can see there are total of four index available on Sales.Store table.

Let’s now fire DBCC IND command for each index to find the number of page these indexes are span.

--Index id 1. PK_Store_BusinessEntityID
DBCC IND ('Adventureworks2008', '[Sales].[Store]', 1);
  Page Count : 103
 
--Index Id 2. AK_Store_rowguid
DBCC IND ('Adventureworks2008', '[Sales].[Store]', 2);
  Page Count : 4 
 
--Index Id 3. IX_Store_SalesPersonID
DBCC IND ('Adventureworks2008', '[Sales].[Store]', 3);
  Page Count : 4 
 
--Index Id 256000. PXML_Store_Demographics
DBCC IND ('Adventureworks2008', '[Sales].[Store]', 256000);
  Page Count : 66 

Since IX_Store_SalesPersonID has lowest pagecount, smallest index among all, if it gets picked up for finding total row count, it would be optimized one.

Let see which index count(*) will pick for getting the total number of rows in Sales.Store table.

select COUNT(*) from Sales.Store








So here we saw for getting count(*) optimizer picked IX_Store_SalesPersonID index.

Conclusion: Thus, when using COUNT(*) verify whether correct indexes are being used and do not believe that COUNT(*) will use Clustered index.   

If you punish a programmer with code he will reach to dreamland, dare to punish


Post Reference: Vikram Aristocratic Elfin Share