We must have experience that whenever we apply group By clause, it automatically sort the record set on the column specified by Group By clause. Lets see whats the basic behind the sorting technique.
Lets fire a query on the AdventureWorks2014
database ProductCategory and ProductSubCategory table.
select pc.Name, sum( psc.ProductSubcategoryID) from Production.ProductCategory pc
inner join Production.ProductSubcategory
psc
on pc.ProductCategoryID =
psc.ProductCategoryID
group by pc.Name
Name
--------------------------------------------------
-----------
Accessories 378
Bikes 6
Clothing 172
Components 147
Here we can see the records are appearing in
sorted order. Lets see the execution plan to check what is happing at the
background.
The execution plan above is clearly shown a sort
operator which is why we are get our data sorted by Name column. Lets dissect
the plan to see how it has executed the complete query.
The series of Logical operation that happened
are
1.
First, Clustered Index Scan on ProductionSubCategory to get all the data
from the table.
2.
Second, Clustered Index Scan on ProductCategory to get all the data from
the table.
3.
Third, Sort operator, which was applied on ProductCategory table to get
the data in sorted manner. Which internally do Order By on ProductCategory.Name
column.
4.
Forth, then inner join was implemented to get the matching records from
both the table.
5.
Fifth, Stream Aggregate was performed on the data which group by data on
the bases of PorductCategory.Name column.
So here we can easily say why group by is
producing result in sorted manner. because of sorting of ProductCategory table
on ProductCategory.Name Column which is internally happening.
Reason for Sort
Operator appearance, the stream aggregate operator always wants its input in sorted order
before it apply its operation of aggregating.
Lets do one more practical to remove the sort
operator, I am interested to remove this because it s taking the most cost
among all the operator.
So internally sort operator arrange the data,
the samething we can accomplish through indexing also, the benefit will be
along with sorting the access to the data from ProductCategory tabke will be
faster.
So why wait lets create Non Clustered Index on
Name column of ProductCategory.
create unique nonclustered index [ak_productcategory_name] on [production].[productcategory]
(
[name] asc
)
Command(s) completed
successfully.
So now we have index on ProductCategory Name
column. Now its time to rerun the same query to see whether we are still
getting the sort operator in execution plan.
select pc.Name, sum( psc.ProductSubcategoryID) from Production.ProductCategory pc
inner join Production.ProductSubcategory
psc
on pc.ProductCategoryID =
psc.ProductCategoryID
group by pc.Name
So we are successful in removing the sort operator and
we can see the Index we created on Name column is now appears in execution
plan.
And since index automatically sort the data, optimizer
didn’t make use of sort operator explicitly to do the sorting and thus we can
see a good plan.
SQL Server with a Tea glass makes a perfect
combination to deal with darkness J
Post Reference: Vikram Aristocratic Elfin Share