About Me

My photo
Mumbai, Maharastra, India
He has more than 7.6 years of experience in the software development. He has spent most of the times in web/desktop application development. He has sound knowledge in various database concepts. You can reach him at viki.keshari@gmail.com https://www.linkedin.com/in/vikrammahapatra/ https://twitter.com/VikramMahapatra http://www.facebook.com/viki.keshari

Search This Blog

Saturday, January 31, 2015

Group by clause sort my record set on Group by specified column


We must have experience that whenever we apply group By clause, it automatically sort the record set on the column specified by Group By clause. Lets see whats the basic behind the sorting technique.

Lets fire a query on the AdventureWorks2014 database ProductCategory and ProductSubCategory table.

select  pc.Name, sum( psc.ProductSubcategoryID) from Production.ProductCategory pc
inner join Production.ProductSubcategory psc
on pc.ProductCategoryID = psc.ProductCategoryID
group by pc.Name

Name                                              
-------------------------------------------------- -----------
Accessories                                        378
Bikes                                              6
Clothing                                           172
Components                                         147

Here we can see the records are appearing in sorted order. Lets see the execution plan to check what is happing at the background.


The execution plan above is clearly shown a sort operator which is why we are get our data sorted by Name column. Lets dissect the plan to see how it has executed the complete query.

The series of Logical operation that happened are
1.        First, Clustered Index Scan on ProductionSubCategory to get all the data from the table.
2.        Second, Clustered Index Scan on ProductCategory to get all the data from the table.
3.        Third, Sort operator, which was applied on ProductCategory table to get the data in sorted manner. Which internally do Order By on ProductCategory.Name column.
4.        Forth, then inner join was implemented to get the matching records from both the table.
5.        Fifth, Stream Aggregate was performed on the data which group by data on the bases of PorductCategory.Name column.
So here we can easily say why group by is producing result in sorted manner. because of sorting of ProductCategory table on ProductCategory.Name Column which is internally happening.

Reason for Sort Operator appearance, the stream aggregate operator always wants its input in sorted order before it apply its operation of aggregating.

Lets do one more practical to remove the sort operator, I am interested to remove this because it s taking the most cost among all the operator.

So internally sort operator arrange the data, the samething we can accomplish through indexing also, the benefit will be along with sorting the access to the data from ProductCategory tabke will be faster.

So why wait lets create Non Clustered Index on Name column of ProductCategory.
create unique nonclustered index [ak_productcategory_name] on [production].[productcategory]
(
       [name] asc
)
Command(s) completed successfully.

So now we have index on ProductCategory Name column. Now its time to rerun the same query to see whether we are still getting the sort operator in execution plan.

select  pc.Name, sum( psc.ProductSubcategoryID) from Production.ProductCategory pc
inner join Production.ProductSubcategory psc
on pc.ProductCategoryID = psc.ProductCategoryID
group by pc.Name


So we are successful in removing the sort operator and we can see the Index we created on Name column is now appears in execution plan.

And since index automatically sort the data, optimizer didn’t make use of sort operator explicitly to do the sorting and thus we can see a good plan.

 SQL Server with a Tea glass makes a perfect combination to deal with darkness J


Post Reference: Vikram Aristocratic Elfin Share

No comments:

Post a Comment