Aristocratic Elfin Share

Tuesday, July 31, 2018

Character family datatype and NULL in Union statement

Conclusion: The default datatype of NULL is INT (wait lets first prove it, we might be wrong).

Yesterday my teammate called me and showed me a strange behavior of NULL with character datatype in union clause (in Teradata database) and this strange behavior is applicable to most of databases (NOT in SQL Server J). Let’s try to stimulate the same scenario.

Here we have below Union statement

select 'a' as col1, 'b' as col2, 100 as col3, 200 as col4

union

select 'x' as col1, 'y' as col2, null, null

col1 col2 col3 col4

---- ---- ----------- -----------

a b 100 200

x y NULL NULL

(2 row(s) affected)

There are four columns in query, first two columns are of character family datatype and col3 and col4 are from Integer family datatype.

Now look at the second query

select 'x' as col1, 'y' as col2, null, null

Here, notice that first two columns are of char datatype and rest two columns are of NULL and union worked perfectly.

Lets revise the union rule:

1. Column number should match in all select query involved in union

2. The datatype of column in one select query should match with other select query participating in UNION.

Now return back to our first select statement in UNION query

select 'a' as col1, 'b' as col2, 100 as col3, 200 as col4

So by Union rule book, any query doing union with above query should have first two columns as character datatype and last two columns as INT datatype.

Now if you see our second query

select 'a' as col1, 'b' as col2, 100 as col3, 200 as col4

union

select 'x' as col1, 'y' as col2, null, null

First two columns are of character datatype which matches with the datatype of first query and unlike first two columns, last two are NULL which doesn’t match with the datatype of first query col3 and col4 datatype. Still query work fine.

There could be two reason why query worked fine:

NULL is compatable with INT datatype
The is an implicit conversion happening with NULL and INT datatype, something like this ( cast( NULL as INT)

Let dig it further by rewriting the query

select 'a' as col1, 'b' as col2, 100 as col3, 200 as col4

union

select 'x', NULL, NULL, NULL

Above query fails, that could means

Compiler is able to do implicit conversion of NULL to character database
NULL is not compatible to Character family datatype.

Now let’s rewrite the query and cast the NULL in second column of second query

select 'a' as col1, 'b' as col2, 100 as col3, 200 as col4

union

select 'x', cast (NULL as varchar(1)), NULL, NULL

col1 col2 col3 col4

---- ---- ----------- -----------

a b 100 200

x NULL NULL NULL

The query work fine.

We can inference from above set of query that NULL is compatible with INT but not with CHAR datatype.

Let find out the root clause of this.

The question must be in your mind must be: Does NULL has any DEFAULT datatype? Lets try to find out this answer.

Here I am storing result of query in temporary table #tempotable

select null as col1 into #tempotable

Lets now see the table and column property of #tempotable

TABLE_QUALIFIER TABLE_OWNER COLUMN_NAME TYPE_NAME PRECISION LENGTH

-----------------------------------------------------------------------

tempdb dbo col1 int 10 4

J here we can see Datatype of NULL is treated as INT Type with 4 byte length.

What we concluded, default datatype of NULL is INT??

Wait before concluding anything so early J

Let declare a variable of variant type and assign it with NULL

declare @var sql_variant = NULL

Now let find out the datatype of @Var variable

select sql_variant_property(@var, 'Basetype') as TypeName,

sql_variant_property(@var, 'Precision') as Precision,

Output:

Type Name Precision

------------------------------

NULL NULL

Now this shows NULL has no datatype J

What we concluded: NULL has no datatype, but when it takes part in forming result set, the compiler consider NULL as INT datatype with 4 byte length and 10 precision.

NOTE: SQL Server 2005 had this issue, in all later versions they rectified this issue and have high degree of NULL handling Enjy coding…SQL J

Post Reference: Vikram Aristocratic Elfin Share

Monday, February 20, 2017

Heap Scan (Table Scan) Vs Clustered Index Scan

Let’s try out which scan would be more effective with below scenario

1) When selecting all columns without filter and order by

2) When selecting all columns with order by.

Before that let’s understand what HEAP and CI are

Heap: Heap is table without Cluster Index, it has one row in sys.partition with index=0, by default a heap has a single partition. SQL Server uses IAM (Internal Allocation Map) to move through the leaf. The data pages and the rows within them are not arranged in any order and are not linked. The information with data page connection are recorded in IAM pages.

Clustered Index: Clustered Index are organized in B-Tree structure with top most node called Root and bottom most are leaf nodes, in between top and bottom are intermediates nodes. The Leaf node consists of Index Key column value and reference to data page of underlying table. The pages in each level are linked in doubly linked list pattern. The pages in the data chain and the rows in them are ordered on the value of the clustered index key. All inserts are made at the point where the key value in the inserted row fits in the ordering sequence among existing rows.

Now let’s create a table and insert around 4 Million records in it.

CREATE TABLE HeapVsClusteredScanTable

(

ColNum BIGINT NOT NULL,

ColBigNum BIGINT NOT NULL,

CharacterColumn CHAR(50)

)

INSERT INTO HeapVsClusteredScanTable

(

ColNum, ColBigNum, CharacterColumn

)

SELECT

ColNum,

ColNum + 5000000,

LEFT(REPLICATE((CAST(ColNum as VARCHAR(50))),50),50)

FROM

(

SELECT

ColNum = row_number() over(order by newid() asc)

FROM master..spt_values a

CROSS APPLY master..spt_values b

WHERE a.type = 'P' AND a.number <= 2000 AND a.number > 0 AND

b.type = 'P' AND b.number <= 2000 AND b.number > 0

) mydata

We have not created any index on this table, so it is a heap. Lets query index stat. Below it shows total pagecount 37384 and there is only one page level.

select page_count, index_depth, page_level = index_level, page_count, record_count

from sys.dm_db_index_physical_stats(db_id(),object_id('HeapVsClusteredScanTable'),null,null,'DETAILED');

page_count index_depth page_level page_count record_count

-------------------- ----------- ---------- -------------------- --------------------

37384 1 0 37384 4000000

(1 row(s) affected)

Now lets query page allocation

select allocated_page_page_id, next_page_page_id, previous_page_page_id

from sys.dm_db_data base_page_allocations(db_id(),object_id('HeapVsClusteredScanTable'),null,null,'DETAILED')

where page_type_desc is not null and page_type_desc = 'DATA_PAGE'

allocated_page_page_id next_page_page_id previous_page_page_id

---------------------- ----------------- ---------------------

400 NULL NULL

401 NULL NULL

402 NULL NULL

403 NULL NULL

404 NULL NULL

405 NULL NULL

406 NULL NULL

Above result tells in case of Heap there no information stored for next and previous page.

Now lets see the IO measure, CPU time and Elapsed time when querying Heap with and without ORDER BY

Without Order By

SET STATISTICS IO on

set statistics time on

select * from HeapVsClusteredScanTable

(4000000 row(s) affected)

Table 'HeapVsClusteredScanTable'. Scan count 1, logical reads 37384, physical reads 0, read-ahead reads 4, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 6844 ms, elapsed time = 93444 ms.

With Order By

SET STATISTICS IO on

set statistics time on

select * from HeapVsClusteredScanTable order by ColNum

(4000000 row(s) affected)

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 36373, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'HeapVsClusteredScanTable'. Scan count 5, logical reads 37384, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 21109 ms, elapsed time = 206015 ms.

Above result shows that with “Order By” Claus the execution time and read-ahead read increate significantly.

Introduce Clustered Index: Now Let’s create a clustered index on table

CREATE CLUSTERED INDEX cix_HeapVsClusteredScanTable

ON HeapVsClusteredScanTable(ColNum)

WITH(MAXDOP=1)

Lets query index physical stat

select page_count, index_depth, page_level = index_level, record_count

from sys.dm_db_index_physical_stats(db_id(),object_id('HeapVsClusteredScanTable'),null,null,'DETAILED');

page_count index_depth page_level record_count

-------------------- ----------- ---------- --------------------

37384 3 0 4000000

79 3 1 37384

1 3 2 79

(3 row(s) affected)

Above output shows that the index has 3 level, the leaf level consist of 37384 pages whereas intermediate level consist of 79 pages and root level consist of 1 page.

Lets now query page allocation dmv

select allocated_page_page_id, next_page_page_id, previous_page_page_id, page_level, page_type_desc

from sys.dm_db_database_page_allocations(db_id(),object_id('HeapVsClusteredScanTable'),null,null,'DETAILED')

where page_type_desc is not null and page_type_desc IN( 'DATA_PAGE', 'INDEX_PAGE')

ORDER BY CASE WHEN page_type_desc = 'INDEX_PAGE' THEN 0 ELSE allocated_page_page_id END ASC

allocated_page_page_id next_page_page_id previous_page_page_id page_level page_type_desc

---------------------- ----------------- --------------------- ---------- ---------------

61128 61129 45911 1 INDEX_PAGE

61129 61130 61128 1 INDEX_PAGE

61130 61131 61129 1 INDEX_PAGE

61131 61132 61130 1 INDEX_PAGE

61132 61133 61131 1 INDEX_PAGE

61133 61134 61132 1 INDEX_PAGE

Here the above output shows that each page in clustered index has pointer to previous and next page. (Doubly linked list pattern)

Now lets see the IO measure, CPU time and Elapsed time when querying Clustered Indexed table with and without ORDER BY

Without Order By

SET STATISTICS IO on

set statistics time on

select * from HeapVsClusteredScanTable

(4000000 row(s) affected)

Table 'NumbersTable'. Scan count 1, logical reads 37465, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 6672 ms, elapsed time = 82045 ms.

With Order By

SET STATISTICS IO on

set statistics time on

select * from HeapVsClusteredScanTable order by ColNum

(4000000 row(s) affected)

Table 'NumbersTable'. Scan count 1, logical reads 37465, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 6657 ms, elapsed time = 86031 ms.

Conclusion: Here we can see, with Cluster Index on table, the Order By query make a significant performance compare to Heap table with order by Clause. However general select statement without order by doesn’t make big difference in term of elapsed and cpu time with Clustered table.

Further we can conclude that you can keep your table in heap if it is very.

Enjy coding…SQL J

Post Reference: Vikram Aristocratic Elfin Share

Aristocratic Elfin Share

Pages

About Me

Search This Blog

Tuesday, July 31, 2018

Character family datatype and NULL in Union statement

Monday, February 20, 2017

Heap Scan (Table Scan) Vs Clustered Index Scan