About Me

My photo
Mumbai, Maharastra, India
He has more than 7.6 years of experience in the software development. He has spent most of the times in web/desktop application development. He has sound knowledge in various database concepts. You can reach him at viki.keshari@gmail.com https://www.linkedin.com/in/vikrammahapatra/ https://twitter.com/VikramMahapatra http://www.facebook.com/viki.keshari

Search This Blog

Friday, June 27, 2014

RID Lookup with HEAP



RID Lookup is a physical Query Execution operator which operates on HEAP Table i.e A table without Clustered Index on it. It comes into picture when non clustered index does not contain all the data requested by a query, so to get the addition data it goes to heap (only in case when you don’t have clustered index on your table) .  RID Lookup always accompanied by Nested Loop Join operator.

Lets stimulate the RID lookup operator in Execution plan.
Here we are creating a table with col2 as datatime datatype which we will use it in where predicate.

CREATE TABLE RIDLookup_Demo_TB
(
    id   INT  IDENTITY (1, 1),
    col2 DATETIME        ,
    col3 NUMERIC (28, 10) NOT NULL
);
Command(s) completed successfully.

Our table is ready to get some data into it. Here we are trying to insert 5 Lkh data where date column i.e. Col2 has value 16June2014 and 5 record with col2 value 16June2014 values.

DECLARE @i AS INT = 0;

BEGIN TRANSACTION;
WHILE @i < 500000   
    BEGIN
        INSERT INTO RIDLookup_Demo_TB (col2, col3)
        SELECT '20140616',
               1000 * rand();
        SET @i = @i + 1;
    END

SET @i = 0;

WHILE @i < 5
    BEGIN
        INSERT INTO RIDLookup_Demo_TB (col2, col3)
        SELECT '20140617',
               1000 * rand();
        SET @i = @i + 1;
    END
COMMIT TRANSACTION;

Now since our data are inserted into table, lets query the table to see the plan

SELECT SUM(col3) FROM RIDLookup_Demo_TB WHERE col2='20140617'


Here we saw, since the table has no clustered index (Primary Key), Table Scan aka Heap Scan comes into picture, but we are concern with RID Lookup. Which is associated with Non Clustered index on table which has no Clustered index define.
it’s time to create Non Clustered Index on col2 of table.

create nonclustered index [ix_col2]
on RIDLookup_Demo_TB ([col2] asc)
Command(s) completed successfully.

Now since NonClustered Index is in place, lets fire the same query to see the RID Lookup in the Execution Plan.

SELECT SUM(col3) FROM RIDLookup_Demo_TB WHERE col2='20140617'


Work around solution
When I see RID Lookup, I always think, why the table has no clustered index on it. So the solution is create Clustered Index on the table.
If you have clustered index on table, the RID lookup will changed to Key Lookup, again when I saw Key Lookup in execution, I think why the non clustered index not covered by required column asked in select clause.

Blend my love with code, and you will find me happy! J
 

Post Reference: Vikram Aristocratic Elfin Share

Thursday, June 26, 2014

Logical Query Processing- Fuzzy Query with Order By Clause



Today it happened, I was called by my fellow programmer, he saw me a piece of T-SQL Code and asks few question on it, I will try to replicate the same code here.
There was a query somehow like this

select sub_group,sub_group,name,name from #temp
where sub_group = 'Furniture'
order by sub_group

And the query was failing, and asked me to justify the why it failing. That’s good, I find it interesting to explain the logic behind the scene through Logical Query Processing.  Yeah ;) I didn’t ask tea this time for this: P

Let’s formulate the same scenario by creating a sample table.

create table #temp
(id int identity(1,1),
sub_group varchar(10),
name varchar(10))
Command(s) completed successfully.

Lets insert few records in it.

insert into #temp
select 'Furniture','chair' union all
select 'Vechile','Maruti' union all
select 'Furniture','Desk' union all
select 'Furniture','Dine' union all
select 'Vechile','Honda' union all
select 'H.Vechile','Mahindra'

Now we can have the same Select query. Lets fire the query and see  the output.

select sub_group,sub_group,name,name from #temp
where sub_group = 'Furniture'
order by sub_group
Msg 209, Level 16, State 1, Line 32
Ambiguous column name 'sub_group'.

Now lets take the error, and dig the background of it. Now it says Ambiguous Column name.

If you see in the select list, we have called this column two times, now if we go through logical query processing, it says for this query.
1st from clause gets executed i.e #temp and record set will build up say RS1.
2nd where by clause gets executed which will filter the data and remain with the data which has ‘Furniture’ as sub_group and create a new record set say RS2
3rd Now the Select clause will gets executed with form one more column of same name as sub_group. So the new record set will have this many column “sub_group, sub_group, name, name” say this record set as RS3
4th Now Oder By clause gets the chance, it operate on RS3 record set, now if you see in query, the order by clause is on sub_group, so when it tries to order the RS3 data, it will found two column with same name sub_group, and optimizer find it difficult which column to pick up, thus it throws Ambiguous column error.

Solution to this problem is: make an alias to one of the sub_group column.

select sub_group,sub_group SB ,name,name from #temp
where sub_group = 'Furniture'
order by sub_group

Run the query and you will get your result.

sub_group  SB         name       name
---------- ---------- ---------- ----------
Furniture  Furniture  chair      chair
Furniture  Furniture  Desk       Desk
Furniture  Furniture  Dine       Dine


You just have to make time for your coding and keep it balance J
 

Post Reference: Vikram Aristocratic Elfin Share

Wednesday, June 25, 2014

A Celebrity among all Execution Plan Operators – KEYLOOKUP


I will start with this contour, this operator is crazy on changing its name, and it has changed its name three times in the last three release of SQL Server. From BookMark Lookup to Cluster Index Seek and now it is Key Lookup. I am personally comfortable with Key Lookup, which make more sense to me to the way it works.

Few facts I would like to point out here about KeyLook Up.

  • It typically comes with Non Clustered Index.
  • It fetches values via cluster index when the required values are not present in (Cover in) Non Clustered Index.
  •  It perform one of the expensive IO operation, I will take on this further in sometime  :)


I want to take it further to explain more on this Celebrity operator, when rows are requested by the query with predicate, optimizer look into non-clustered index applied on column participate in where predicate to retrieve the requested data. If the all requested data is not found in Non Clustered Index then it go to data page where the data is stored and retrieve it.

Now in above situation, if clustered Index present on table, it goes to Clustered index to get the physical location of data and this operation is called as KEYLOOKUP and if there is no clustered index present on table then there is something called RID lookup comes into picture.

Now it’s time to get yourself relaxed and have popcorn with cup of tea, ah I m addicted to Tea, you can have CokeJ

Ya, now we can take up our point for more understanding “It perform one of the expensive IO operation”
It is one of the most expensive operations because for each value in non clustered index it goes to clustered index to retrieve the asked data.

Lets stimulate a condition where we have Keylookup operator comes into picture in execution plan.

I am using AdventureWorks2008R2 database and Person.Address Table to stimulate the condition, let’s see how many different indexes are present on Person.Address table

select * from sys.indexes where object_id = OBJECT_ID('Person.Address')

Table   name                         type_desc
------- ---------------------------  ---------------
Address PK_Address_AddressID         CLUSTERED
Address AK_Address_rowguid           NONCLUSTERED
Address IX_Address_AddressLine1_    
        AddressLine2_City_           
        StateProvinceID_PostalCode   NONCLUSTERED
Address IX_Address_StateProvinceID   NONCLUSTERED

So here we can see, we have 4 indexes, one cluster and 3 non clustered index. Lets disable all non clustered indexes first.

alter index AK_Address_rowguid
on Person.Address DISABLE
go

alter index IX_Address_AddressLine1_AddressLine2_City_StateProvinceID_PostalCode
on Person.Address DISABLE
go

alter index IX_Address_StateProvinceID
on Person.Address DISABLE
go
Command(s) completed successfully.

Now lets fire the query
select AddressID, City, StateProvinceID from Person.Address where City = 'Las Vegas'



Here we are creating a non clustered index on City column which is used in where predicate. Once it s done, again we will check the query Plan.

CREATE NONCLUSTERED INDEX ix_PersonAddress_City
ON [Person].[Address] ([City])

Now lets query the same select statement to see how this newly created index perform, is it  better then the above clustered index scan method

select AddressID, City, StateProvinceID from Person.Address where City = 'Las Vegas'



To be honest, seeing Keylookup in Execution plan excites SQL guy, because he has a good chance to improve further the performance of Query by simply creating Covering Index. I will take covering index in separate article.

Here Key Lookup operator is used to read the value of AddressID, StateProvinceID from by referencing to clustered Index, since this two column are not included in Non Clustered Index.

But at the same time Key Lookup Operator will worsen your cost, if the number of rows to return is more because for each rows it has to switch to Clustered Index to get the value. So that moments scan would be more efficient strategy then Key Lookup. But yes we have the option of Covering those column in Non Clustered Index which prompt Key Lookup by creating Covering Index.
         
If I can love someone, I need to love her as I love my Programming with SQL Server! :)
 

Post Reference: Vikram Aristocratic Elfin Share