About Me

My photo
Mumbai, Maharastra, India
He has more than 7.6 years of experience in the software development. He has spent most of the times in web/desktop application development. He has sound knowledge in various database concepts. You can reach him at viki.keshari@gmail.com https://www.linkedin.com/in/vikrammahapatra/ https://twitter.com/VikramMahapatra http://www.facebook.com/viki.keshari

Search This Blog

Sunday, June 29, 2014

Parameter Caching: Parameter Sniffing with Adhoc Select Query


Read Previous Post: Plan Caching VI - Forced Parameterization in Adhoc Queries

Here we will see, how a query makes a difference in Execution plan when it has Parameter passed in where predicate. Whenever you are firing any query the plan is generated and Cached in Memory, but it doest not store the context of the execution i.e (plan based on Parameter passed).

Sometime when we use parameter to pass in where condition, the actual value of parameter is not known at compile time due to which the optimal Execution plan is not selected by SQL Optimizer, this is because query is compiled before actual value of parameter is known.

Lets check out with a small practical

create table ParameterSniffingDemoTB
(id int identity(1,1) primary key,
col2 datetime,
col3 numeric(28,10) not null)  
Command(s) completed successfully.

Lets insert some rows in the table

declare @i int = 0
begin tran
while @i < 50000

begin  insert into ParameterSniffingDemo_TB(col2,col3)
     select '20140616', 1000*rand()
     set @i=@i+1
end  

set @i=0
while @i < 5
begin
     insert into ParameterSniffingDemo_TB(col2,col3)
     select '20140617', 1000*rand()
     set @i=@i+1
end

commit tran 

Now since our table is populated with data, Lets create a non clustered index on col2

create nonclustered index [ix_col2]
on [ParameterSniffingDemo_TB] ([col2] asc)
Command(s) completed successfully.

Non Clustered Index is on place now. Lets query on the basis of col2 and see whether Non Clustered index participate in Execution plan.

select SUM(col3) FROM ParameterSniffingDemo_TB WHERE col2='20140617'



Here we can see Non cluster Index participated in the execution plan.

Now lets declare a parameter variable and check whether it will use the existing Non Clustered index.

declare @mydate datetime = '20140617'
select sum(col3) FROM ParameterSniffingDemo_TB WHERE col2=@mydate



Here we can see the Non clustered index is not taken up by Optimizer while creating Plan for the Query.

Lets check the comparative cost of the query, for this we are running both query in a batch

--First Query with out Parameter Variable
select sum(col3) FROM ParameterSniffingDemo_TB WHERE col2='20140617'

--Second Query with local Parameter Variable
declare @mydate datetime = '20140617'
select sum(col3) FROM ParameterSniffingDemo_TB WHERE col2=@mydate  --option(recompile)



Here we can see the Cost of query with value as parameter choose optimal plan with just 6% of total to execute whereas Query with Parameter variable choose non optimal plan.  

This problem is called Parameter Sniffing.

Workaround Solution :- Option (Recompile) Hint

declare @mydate datetime = '20140617'
select sum(col3) FROM ParameterSniffingDemo_TB WHERE col2=@mydate  option(recompile)



It’s your smile which makes me smile, let me smile and do my duty towards my code, Keep smiling my dear! J
 

Post Reference: Vikram Aristocratic Elfin Share

Friday, June 27, 2014

RID Lookup with HEAP



RID Lookup is a physical Query Execution operator which operates on HEAP Table i.e A table without Clustered Index on it. It comes into picture when non clustered index does not contain all the data requested by a query, so to get the addition data it goes to heap (only in case when you don’t have clustered index on your table) .  RID Lookup always accompanied by Nested Loop Join operator.

Lets stimulate the RID lookup operator in Execution plan.
Here we are creating a table with col2 as datatime datatype which we will use it in where predicate.

CREATE TABLE RIDLookup_Demo_TB
(
    id   INT  IDENTITY (1, 1),
    col2 DATETIME        ,
    col3 NUMERIC (28, 10) NOT NULL
);
Command(s) completed successfully.

Our table is ready to get some data into it. Here we are trying to insert 5 Lkh data where date column i.e. Col2 has value 16June2014 and 5 record with col2 value 16June2014 values.

DECLARE @i AS INT = 0;

BEGIN TRANSACTION;
WHILE @i < 500000   
    BEGIN
        INSERT INTO RIDLookup_Demo_TB (col2, col3)
        SELECT '20140616',
               1000 * rand();
        SET @i = @i + 1;
    END

SET @i = 0;

WHILE @i < 5
    BEGIN
        INSERT INTO RIDLookup_Demo_TB (col2, col3)
        SELECT '20140617',
               1000 * rand();
        SET @i = @i + 1;
    END
COMMIT TRANSACTION;

Now since our data are inserted into table, lets query the table to see the plan

SELECT SUM(col3) FROM RIDLookup_Demo_TB WHERE col2='20140617'


Here we saw, since the table has no clustered index (Primary Key), Table Scan aka Heap Scan comes into picture, but we are concern with RID Lookup. Which is associated with Non Clustered index on table which has no Clustered index define.
it’s time to create Non Clustered Index on col2 of table.

create nonclustered index [ix_col2]
on RIDLookup_Demo_TB ([col2] asc)
Command(s) completed successfully.

Now since NonClustered Index is in place, lets fire the same query to see the RID Lookup in the Execution Plan.

SELECT SUM(col3) FROM RIDLookup_Demo_TB WHERE col2='20140617'


Work around solution
When I see RID Lookup, I always think, why the table has no clustered index on it. So the solution is create Clustered Index on the table.
If you have clustered index on table, the RID lookup will changed to Key Lookup, again when I saw Key Lookup in execution, I think why the non clustered index not covered by required column asked in select clause.

Blend my love with code, and you will find me happy! J
 

Post Reference: Vikram Aristocratic Elfin Share

Thursday, June 26, 2014

Logical Query Processing- Fuzzy Query with Order By Clause



Today it happened, I was called by my fellow programmer, he saw me a piece of T-SQL Code and asks few question on it, I will try to replicate the same code here.
There was a query somehow like this

select sub_group,sub_group,name,name from #temp
where sub_group = 'Furniture'
order by sub_group

And the query was failing, and asked me to justify the why it failing. That’s good, I find it interesting to explain the logic behind the scene through Logical Query Processing.  Yeah ;) I didn’t ask tea this time for this: P

Let’s formulate the same scenario by creating a sample table.

create table #temp
(id int identity(1,1),
sub_group varchar(10),
name varchar(10))
Command(s) completed successfully.

Lets insert few records in it.

insert into #temp
select 'Furniture','chair' union all
select 'Vechile','Maruti' union all
select 'Furniture','Desk' union all
select 'Furniture','Dine' union all
select 'Vechile','Honda' union all
select 'H.Vechile','Mahindra'

Now we can have the same Select query. Lets fire the query and see  the output.

select sub_group,sub_group,name,name from #temp
where sub_group = 'Furniture'
order by sub_group
Msg 209, Level 16, State 1, Line 32
Ambiguous column name 'sub_group'.

Now lets take the error, and dig the background of it. Now it says Ambiguous Column name.

If you see in the select list, we have called this column two times, now if we go through logical query processing, it says for this query.
1st from clause gets executed i.e #temp and record set will build up say RS1.
2nd where by clause gets executed which will filter the data and remain with the data which has ‘Furniture’ as sub_group and create a new record set say RS2
3rd Now the Select clause will gets executed with form one more column of same name as sub_group. So the new record set will have this many column “sub_group, sub_group, name, name” say this record set as RS3
4th Now Oder By clause gets the chance, it operate on RS3 record set, now if you see in query, the order by clause is on sub_group, so when it tries to order the RS3 data, it will found two column with same name sub_group, and optimizer find it difficult which column to pick up, thus it throws Ambiguous column error.

Solution to this problem is: make an alias to one of the sub_group column.

select sub_group,sub_group SB ,name,name from #temp
where sub_group = 'Furniture'
order by sub_group

Run the query and you will get your result.

sub_group  SB         name       name
---------- ---------- ---------- ----------
Furniture  Furniture  chair      chair
Furniture  Furniture  Desk       Desk
Furniture  Furniture  Dine       Dine


You just have to make time for your coding and keep it balance J
 

Post Reference: Vikram Aristocratic Elfin Share