Aristocratic Elfin Share: Optimization

Showing posts with label Optimization. Show all posts

Friday, February 13, 2015

Myth, Group by always result in sorting the table on Group by Column

There are lot threads on various forum discussing automatic sorting behaviour of Group By clause, lets put extra mile to prove this myth incorrect.

So here is my query, I am running this dummy query on AdventureWorks2012 database

select pc.Name, sum( psc.ProductSubcategoryID) as 'Sum of Sub Product Id' from Production.ProductCategory pc

inner join Production.ProductSubcategory psc

on pc.ProductCategoryID = psc.ProductCategoryID

group by pc.Name

Name Sum of Sub Product Id

-------------------------------------------------- ---------------------

Accessories 378

Bikes 6

Clothing 172

Components 147

Let’s see from the prospective of execution plan,

From execution plan we can see the Group By physical operator is replaced by Stream Aggregate logical operator. And since Stream Operator always want its input to be in sorted order, we can see an existence of Sort operator in the execution plan which is sorting the ProductCategory data on ProductCategory.Name wise. And this is the reason why we get our result in sorted order when we use Group by clause our query.

Now just think about a situation where instead of Stream Aggregate, optimizer feels to use Hash Aggregate, which don’t require its input to be in sorted order.

Let’s try to bring Hash Operator in the plan in place of Stream Operator.

I am not able to stimulate the situation where Hash Aggregate appears in plan. So lets play a tricky game with Optimizer, and tell there is no such Stream aggregate operator present thus don’t make it use for creating plan

You can disable Stream Aggregate operator by

DBCC TRACEON (3604);

DBCC RULEOFF('GbAggToStrm');

But later after your operation don’t forget to Rule on for Stream Aggregate operator.

DBCC TRACEON (3604);

DBCC RULEOFF('GbAggToStrm');

select pc.Name, sum( psc.ProductSubcategoryID) as 'Sum of Sub Product Id' from Production.ProductCategory pc

inner join Production.ProductSubcategory psc

on pc.ProductCategoryID = psc.ProductCategoryID

group by pc.Name

OPTION (RECOMPILE);

DBCC RULEON('GbAggToStrm');

Name Sum of Sub Product Id

-------------------------------------------------- ---------------------

Bikes 6

Clothing 172

Accessories 378

Components 147

Here we can see the result is didn’t get sorted by optimizer, lets see how optimizer executed this query.

We can see the Stream Aggregate operator replaced by Hash Aggregate which don’t require its input to be in sorted manner and that is the reason why our result resulted in unsorted manner.

So to be in safe side, if you want your group by data in sorted manner do use order by clause along with group by, this way you can guarantee your data to be in sorted manner.

DBCC TRACEON (3604);

DBCC RULEOFF('GbAggToStrm');

select pc.Name, sum( psc.ProductSubcategoryID) as 'Sum of Sub Product Id' from Production.ProductCategory pc

inner join Production.ProductSubcategory psc

on pc.ProductCategoryID = psc.ProductCategoryID

group by pc.Name

order by pc.Name

OPTION (RECOMPILE);

DBCC RULEON('GbAggToStrm');

Name Sum of Sub Product Id

-------------------------------------------------- ---------------------

Accessories 378

Bikes 6

Clothing 172

Components 147

Conclusion: Always use order by along with Group By, if you want your data to be in sorted manner.

SQL Server with a Tea glass makes a perfect combination to deal with darkness :)

Post Reference: Vikram Aristocratic Elfin Share

Tuesday, January 20, 2015

Index vs Sort Operator, how we can remove sort operator from execution plan

Here in this article we will see how we sort operator is performing in a challenge with index scanning operator, lets create platform for executing this test.

Here we are creating table Tab_EmployeeAvoidingSort which has no index defined on any column.

set nocount on

if object_id('Tab_EmployeeAvoidingSort') is not null

drop table Tab_EmployeeAvoidingSort

create table Tab_EmployeeAvoidingSort

(id int identity(1,1),

name varchar(30),

salary numeric(18,2));

Lets insert some set of data to the table, say 1000

declare @i smallint

set @i = 0

while @i < 1000

begin

insert into Tab_EmployeeAvoidingSort(name, salary)

select 'Employee', abs(convert(numeric(18,2), (checksum(newid()) / 500000.0)))

set @i = @i + 1

end

Our table is ready to do the test on sort operation. Lets on the Statistic profiler to track the IO and CPU and total cost taken by query involving sort operator.

SET STATISTICS PROFILE ON

Select name from Tab_EmployeeAvoidingSort order by name

Here we can see the cost for execution involving the sort operator came out to 0.0035.

Lets create index on name column on name column.

create nonclustered index ix_name on Tab_EmployeeAvoidingSort(name)

Command(s) completed successfully.

Index is all set, lets try out the same query to see what cost it involves

SET STATISTICS PROFILE ON

select name from Tab_EmployeeAvoidingSort with (index = ix_name) order by name

Oh!!! Great, total cost of query comes out as 0.0066, which is much lesser then the one involving sort operator.

Conclusion: Try to implement index on order by column.

Get your tea and start your work :)

Post Reference: Vikram Aristocratic Elfin Share

Tuesday, December 16, 2014

Partitioning Large table

Partitioning is a feature is introduced to aid in maintenance on larger tables and to present fast ways to load and remove large amounts of data from a table. Partitioning can boost query performance.

Each table has a default partition which we call it PRIMARY partition which contain all data. To add partitions, you need to define a partition function and the corresponding partition scheme. A partition scheme uses the partition function to map data ranges to appropriate file groups. If the partition function defines three ranges, then partition scheme must map them to four file groups. Three ranges defined by the partition function and the forth, catchall range, for all data outside the boundaries of the predefined ranges.

Lets take an example to understand the partition. Here we are creating a table with three field id, name and salary. Later we will base of partition on ID field.

create table tab_partitionDemo

(id int identity(1,1),

name varchar(150),

salary int)

Command(s) completed successfully.

Lets insert 1000 dumy records in newly created table.

declare @rwCnt int = 10000

while @rwCnt >= 1

begin

insert into tab_partitionDemo values('abc' + cast(@rwCnt as varchar), 10*rwCnt)

set @rwCnt = @rwcnt -1

end

Now we have our table ready, let add two file group. A filegroup is a logical storage unit. Every database has a primary filegroup that contains the primary data file (.mdf). An additional, user-defined, filegrups can be created to contain secondary files (.ndf). We will create two file group here, one which will store first 5,000 records and the other one will store data from 5,000 to 10,000.

ALTER DATABASE TestDB ADD FILEGROUP partitionDemoFG01_50K

ALTER DATABASE TestDB ADD FILEGROUP partitionDemoFG50K_10K

Command(s) completed successfully.

Now its turn to add file(.ndf) to the newly created filegroup, which will hold data.

ALTER DATABASE TestDB

ADD FILE

(NAME = N'data_01_10',

FILENAME = N'D:\SQL Server\Data\data_FilePartitionDemoF01_50K.ndf',

SIZE = 5000KB,

MAXSIZE = 10000MB,

FILEGROWTH = 500MB)

TO FILEGROUP partitionDemoFG01_50K

ALTER DATABASE TestDB

ADD FILE

(NAME = N'data_11_20',

FILENAME = N'D:\SQL Server\Data\data_partitionDemoF50K_10K.ndf',

SIZE = 5000KB,

MAXSIZE = 10000MB,

FILEGROWTH = 500MB)

TO FILEGROUP partitionDemoFG50K_10K

Command(s) completed successfully.

To check the file created we can query sys.database_files

SELECT name as [FileName], physical_name as [FilePath]

FROM sys.database_files where type_desc = 'ROWS'

Now its time to create partition function, a partition function is one which maps the table data to the partition based upon the partition column. Here in below code we are creating 3 partition on the bases of int value field i.e ID.

CREATE PARTITION FUNCTION IDPartitionFunction (int)

RANGE LEFT FOR VALUES (5000,10000 );

Command(s) completed successfully.

Now we need to map the partition to each partition group with the help of Partition Schema.

CREATE PARTITION SCHEME IDPartitionSchema

AS PARTITION IDPartitionFunction

TO (partitionDemoFG01_50K,partitionDemoFG50K_10K,[PRIMARY]);

Command(s) completed successfully.

Now we need to redistribute the table data to various partition, for that we need to either create clustered index on selective table column or if it already present then drop and recreate with On clause on Partition column.

create clustered index ix_tab_partitionDemo_id

on dbo.tab_partitionDemo(ID)

ON IDPartitionSchema(id)

Command(s) completed successfully.

This is the only thing one need to do for partition, lets check whether our data get distributed to various partition by querying sys.index table.

SELECT o.name objectname,i.name indexname, partition_id, partition_number, [rows]

FROM sys.partitions p

INNER JOIN sys.objects o ON o.object_id=p.object_id

INNER JOIN sys.indexes i ON i.object_id=p.object_id and p.index_id=i.index_id

WHERE o.name LIKE '%tab_partitionDemo%'

objectname indexname partition_id partition_number rows

tab_partitionDemo ix_tab_partitionDemo_id 72057594148093952 1 5000

tab_partitionDemo ix_tab_partitionDemo_id 72057594148159488 2 5000

tab_partitionDemo ix_tab_partitionDemo_id 72057594148225024 3 0

Here we saw 10K records of the table get distributed to two partition. Now lets add one more records with partition column value more then 10k.

insert into dbo.tab_partitionDemo values('Rakesh',40000)

SELECT o.name objectname,i.name indexname, partition_id, partition_number, [rows]

FROM sys.partitions p

INNER JOIN sys.objects o ON o.object_id=p.object_id

INNER JOIN sys.indexes i ON i.object_id=p.object_id and p.index_id=i.index_id

WHERE o.name LIKE '%tab_partitionDemo%'

objectname indexname partition_id partition_number rows

tab_partitionDemo ix_tab_partitionDemo_id 72057594148093952 1 5000

tab_partitionDemo ix_tab_partitionDemo_id 72057594148159488 2 5000

tab_partitionDemo ix_tab_partitionDemo_id 72057594148225024 3 1

So we can see the newly created row go to primary partition. That’s what we were expecting. Cool!!!

SQL Server, where there is mountain of happiness sits :)

Post Reference: Vikram Aristocratic Elfin Share

Aristocratic Elfin Share

Pages

About Me

Search This Blog

Friday, February 13, 2015

Myth, Group by always result in sorting the table on Group by Column

Tuesday, January 20, 2015

Index vs Sort Operator, how we can remove sort operator from execution plan

Tuesday, December 16, 2014

Partitioning Large table