Let’s try out which scan would be more effective with below scenario
1) When selecting all
columns without filter and order by
2) When selecting all
columns with order by.
Before that let’s understand what HEAP and CI are
Heap: Heap is table without Cluster Index, it has one row in
sys.partition with index=0, by default a heap has a single partition. SQL
Server uses IAM (Internal Allocation Map) to move through the leaf. The data
pages and the rows within them are not arranged in any order and are not linked.
The information with data page connection are recorded in IAM pages.
Clustered Index: Clustered Index
are organized in B-Tree structure with top most node called Root and bottom
most are leaf nodes, in between top and bottom are intermediates nodes. The
Leaf node consists of Index Key column value and reference to data page of
underlying table. The pages in each level are linked in doubly linked list
pattern. The pages in the data chain and the rows in them are ordered on the
value of the clustered index key. All inserts are made at the point where the
key value in the inserted row fits in the ordering sequence among existing
rows.
Now let’s create a table and insert around 4 Million records in it.
CREATE TABLE HeapVsClusteredScanTable
(
ColNum
BIGINT NOT NULL,
ColBigNum
BIGINT NOT NULL,
CharacterColumn
CHAR(50)
)
GO
INSERT INTO HeapVsClusteredScanTable
(
ColNum, ColBigNum, CharacterColumn
)
SELECT
ColNum,
ColNum
+ 5000000,
LEFT(REPLICATE((CAST(ColNum as VARCHAR(50))),50),50)
FROM
(
SELECT
ColNum
= row_number() over(order by newid() asc)
FROM master..spt_values a
CROSS APPLY master..spt_values b
WHERE a.type = 'P' AND a.number <= 2000 AND a.number > 0 AND
b.type = 'P' AND b.number <= 2000 AND b.number > 0
) mydata
We have not created any index on this table, so it is a heap. Lets query
index stat. Below it shows total pagecount 37384 and there is only one page level.
select page_count, index_depth, page_level = index_level, page_count, record_count
from sys.dm_db_index_physical_stats(db_id(),object_id('HeapVsClusteredScanTable'),null,null,'DETAILED');
GO
page_count index_depth page_level
page_count record_count
-------------------- -----------
---------- -------------------- --------------------
37384 1
0 37384 4000000
(1 row(s) affected)
Now lets query page allocation
select allocated_page_page_id,
next_page_page_id, previous_page_page_id
from sys.dm_db_data base_page_allocations(db_id(),object_id('HeapVsClusteredScanTable'),null,null,'DETAILED')
where page_type_desc is not null and page_type_desc = 'DATA_PAGE'
allocated_page_page_id next_page_page_id
previous_page_page_id
---------------------- -----------------
---------------------
400 NULL NULL
401 NULL NULL
402 NULL NULL
403 NULL NULL
404 NULL NULL
405 NULL NULL
406 NULL NULL
Above result tells in case of Heap there no information stored for next
and previous page.
Now lets see the IO measure, CPU time and Elapsed time when querying
Heap with and without ORDER BY
Without Order By
SET STATISTICS IO on
set statistics time on
go
select * from
HeapVsClusteredScanTable
(4000000
row(s) affected)
Table
'HeapVsClusteredScanTable'. Scan count 1, logical reads 37384, physical reads
0, read-ahead reads 4, lob logical reads 0, lob physical reads 0, lob
read-ahead reads 0.
SQL Server Execution Times:
CPU time = 6844 ms, elapsed time = 93444 ms.
With Order By
SET STATISTICS IO on
set statistics time on
go
select * from
HeapVsClusteredScanTable order by ColNum
(4000000
row(s) affected)
Table
'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads
36373, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table
'HeapVsClusteredScanTable'. Scan count 5, logical reads 37384, physical reads
0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob
read-ahead reads 0.
SQL Server Execution Times:
CPU time = 21109 ms, elapsed time = 206015 ms.
Above result shows that with “Order By” Claus the execution time and
read-ahead read increate significantly.
Introduce Clustered Index: Now Let’s create a clustered index on table
CREATE CLUSTERED INDEX cix_HeapVsClusteredScanTable
ON HeapVsClusteredScanTable(ColNum)
WITH(MAXDOP=1)
Lets query index physical stat
select page_count, index_depth, page_level = index_level, record_count
from sys.dm_db_index_physical_stats(db_id(),object_id('HeapVsClusteredScanTable'),null,null,'DETAILED');
GO
page_count index_depth page_level record_count
-------------------- -----------
---------- --------------------
37384 3 0 4000000
79 3 1 37384
1 3 2 79
(3 row(s) affected)
Above output shows that the
index has 3 level, the leaf level consist of 37384 pages whereas intermediate level
consist of 79 pages and root level consist of 1 page.
Lets now query page
allocation dmv
select allocated_page_page_id,
next_page_page_id, previous_page_page_id, page_level, page_type_desc
from sys.dm_db_database_page_allocations(db_id(),object_id('HeapVsClusteredScanTable'),null,null,'DETAILED')
where page_type_desc is not null and page_type_desc IN( 'DATA_PAGE', 'INDEX_PAGE')
ORDER BY CASE WHEN page_type_desc = 'INDEX_PAGE' THEN 0 ELSE allocated_page_page_id END ASC
allocated_page_page_id next_page_page_id
previous_page_page_id page_level page_type_desc
---------------------- -----------------
--------------------- ---------- ---------------
61128 61129 45911 1 INDEX_PAGE
61129 61130 61128 1 INDEX_PAGE
61130 61131 61129 1 INDEX_PAGE
61131 61132 61130 1 INDEX_PAGE
61132 61133 61131 1 INDEX_PAGE
61133 61134 61132 1 INDEX_PAGE
Here the above output shows
that each page in clustered index has pointer to previous and next page.
(Doubly linked list pattern)
Now lets see the IO measure, CPU time and Elapsed time when querying Clustered
Indexed table with and without ORDER BY
Without Order By
SET STATISTICS IO on
set statistics time on
go
select * from
HeapVsClusteredScanTable
(4000000
row(s) affected)
Table
'NumbersTable'. Scan count 1, logical reads 37465, physical reads 0, read-ahead
reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL
Server Execution Times:
CPU time = 6672 ms, elapsed time = 82045 ms.
With Order By
SET STATISTICS IO on
set statistics time on
go
select * from
HeapVsClusteredScanTable order by ColNum
(4000000 row(s) affected)
Table 'NumbersTable'. Scan count 1, logical reads 37465,
physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads
0, lob read-ahead reads 0.
SQL Server
Execution Times:
CPU time = 6657 ms, elapsed time = 86031 ms.
Conclusion: Here we can see, with Cluster Index on table, the
Order By query make a significant performance compare to Heap table with order
by Clause. However general select statement without order by doesn’t make big
difference in term of elapsed and cpu time with Clustered table.
Further we can conclude that you can keep your table in heap if it is
very.
Enjy coding…SQL J