Aristocratic Elfin Share: Partitioning

Showing posts with label Partitioning. Show all posts

Tuesday, December 16, 2014

Partitioning Large table

Partitioning is a feature is introduced to aid in maintenance on larger tables and to present fast ways to load and remove large amounts of data from a table. Partitioning can boost query performance.

Each table has a default partition which we call it PRIMARY partition which contain all data. To add partitions, you need to define a partition function and the corresponding partition scheme. A partition scheme uses the partition function to map data ranges to appropriate file groups. If the partition function defines three ranges, then partition scheme must map them to four file groups. Three ranges defined by the partition function and the forth, catchall range, for all data outside the boundaries of the predefined ranges.

Lets take an example to understand the partition. Here we are creating a table with three field id, name and salary. Later we will base of partition on ID field.

create table tab_partitionDemo

(id int identity(1,1),

name varchar(150),

salary int)

Command(s) completed successfully.

Lets insert 1000 dumy records in newly created table.

declare @rwCnt int = 10000

while @rwCnt >= 1

begin

insert into tab_partitionDemo values('abc' + cast(@rwCnt as varchar), 10*rwCnt)

set @rwCnt = @rwcnt -1

end

Now we have our table ready, let add two file group. A filegroup is a logical storage unit. Every database has a primary filegroup that contains the primary data file (.mdf). An additional, user-defined, filegrups can be created to contain secondary files (.ndf). We will create two file group here, one which will store first 5,000 records and the other one will store data from 5,000 to 10,000.

ALTER DATABASE TestDB ADD FILEGROUP partitionDemoFG01_50K

ALTER DATABASE TestDB ADD FILEGROUP partitionDemoFG50K_10K

Command(s) completed successfully.

Now its turn to add file(.ndf) to the newly created filegroup, which will hold data.

ALTER DATABASE TestDB

ADD FILE

(NAME = N'data_01_10',

FILENAME = N'D:\SQL Server\Data\data_FilePartitionDemoF01_50K.ndf',

SIZE = 5000KB,

MAXSIZE = 10000MB,

FILEGROWTH = 500MB)

TO FILEGROUP partitionDemoFG01_50K

ALTER DATABASE TestDB

ADD FILE

(NAME = N'data_11_20',

FILENAME = N'D:\SQL Server\Data\data_partitionDemoF50K_10K.ndf',

SIZE = 5000KB,

MAXSIZE = 10000MB,

FILEGROWTH = 500MB)

TO FILEGROUP partitionDemoFG50K_10K

Command(s) completed successfully.

To check the file created we can query sys.database_files

SELECT name as [FileName], physical_name as [FilePath]

FROM sys.database_files where type_desc = 'ROWS'

Now its time to create partition function, a partition function is one which maps the table data to the partition based upon the partition column. Here in below code we are creating 3 partition on the bases of int value field i.e ID.

CREATE PARTITION FUNCTION IDPartitionFunction (int)

RANGE LEFT FOR VALUES (5000,10000 );

Command(s) completed successfully.

Now we need to map the partition to each partition group with the help of Partition Schema.

CREATE PARTITION SCHEME IDPartitionSchema

AS PARTITION IDPartitionFunction

TO (partitionDemoFG01_50K,partitionDemoFG50K_10K,[PRIMARY]);

Command(s) completed successfully.

Now we need to redistribute the table data to various partition, for that we need to either create clustered index on selective table column or if it already present then drop and recreate with On clause on Partition column.

create clustered index ix_tab_partitionDemo_id

on dbo.tab_partitionDemo(ID)

ON IDPartitionSchema(id)

Command(s) completed successfully.

This is the only thing one need to do for partition, lets check whether our data get distributed to various partition by querying sys.index table.

SELECT o.name objectname,i.name indexname, partition_id, partition_number, [rows]

FROM sys.partitions p

INNER JOIN sys.objects o ON o.object_id=p.object_id

INNER JOIN sys.indexes i ON i.object_id=p.object_id and p.index_id=i.index_id

WHERE o.name LIKE '%tab_partitionDemo%'

objectname indexname partition_id partition_number rows

tab_partitionDemo ix_tab_partitionDemo_id 72057594148093952 1 5000

tab_partitionDemo ix_tab_partitionDemo_id 72057594148159488 2 5000

tab_partitionDemo ix_tab_partitionDemo_id 72057594148225024 3 0

Here we saw 10K records of the table get distributed to two partition. Now lets add one more records with partition column value more then 10k.

insert into dbo.tab_partitionDemo values('Rakesh',40000)

SELECT o.name objectname,i.name indexname, partition_id, partition_number, [rows]

FROM sys.partitions p

INNER JOIN sys.objects o ON o.object_id=p.object_id

INNER JOIN sys.indexes i ON i.object_id=p.object_id and p.index_id=i.index_id

WHERE o.name LIKE '%tab_partitionDemo%'

objectname indexname partition_id partition_number rows

tab_partitionDemo ix_tab_partitionDemo_id 72057594148093952 1 5000

tab_partitionDemo ix_tab_partitionDemo_id 72057594148159488 2 5000

tab_partitionDemo ix_tab_partitionDemo_id 72057594148225024 3 1

So we can see the newly created row go to primary partition. That’s what we were expecting. Cool!!!

SQL Server, where there is mountain of happiness sits :)

Post Reference: Vikram Aristocratic Elfin Share

Sunday, July 7, 2013

Selecting Top N Records Group wise

Often we require to query to find top among subset, for example if you have a table called OderPlaced where you want the top two order for each product. There are couple of way we can actually do this, let’s check out

Problem Statement: To find the top 2 orders for each productid based upon the earliest orderid.

Solution: Let’s create a OrderPlace table with two column OderId and ProductId

create table OrderPlace

(OrderId int,

ProductId int)

Command(s) completed successfully.

Lets insert few records in OrderPlace table

insert into OrderPlace (OrderId, ProductId) values (82, 707)

insert into OrderPlace (OrderId, ProductId) values (83, 707)

insert into OrderPlace (OrderId, ProductId) values (84, 707)

insert into OrderPlace (OrderId, ProductId) values (82, 708)

insert into OrderPlace (OrderId, ProductId) values (83, 708)

insert into OrderPlace (OrderId, ProductId) values (84, 708)

insert into OrderPlace (OrderId, ProductId) values (82, 711)

insert into OrderPlace (OrderId, ProductId) values (83, 711)

insert into OrderPlace (OrderId, ProductId) values (84, 711)

insert into OrderPlace (OrderId, ProductId) values (82, 712)

insert into OrderPlace (OrderId, ProductId) values (83, 712)

insert into OrderPlace (OrderId, ProductId) values (84, 712)

insert into OrderPlace (OrderId, ProductId) values (82, 714)

insert into OrderPlace (OrderId, ProductId) values (83, 714)

insert into OrderPlace (OrderId, ProductId) values (84, 714)

insert into OrderPlace (OrderId, ProductId) values (82, 715)

insert into OrderPlace (OrderId, ProductId) values (83, 715)

insert into OrderPlace (OrderId, ProductId) values (84, 715)

insert into OrderPlace (OrderId, ProductId) values (83, 716)

insert into OrderPlace (OrderId, ProductId) values (84, 716)

Now since we have records in place, lets design the query to find top 2 orders for each productid based upon the earliest ordered.

select OrderId,ProductId from

(select op.OrderId,op.ProductId, rowid = ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY OrderId)

from OrderPlace op) t_op

where rowid < =2

OrderId ProductId

----------- -----------

82 707

83 707

82 708

83 708

82 711

83 711

82 712

83 712

82 714

83 714

82 715

83 715

83 716

84 716

(14 row(s) affected)

Here we gave row number by grouping ProductId then in outer query we just fetch those records which has row number >=2.

Programming is a mad work performed by genius minds

Thursday, April 25, 2013

Spread storage of data and indexes to different disks to improve I/O performance

When we are creating any database, the default FILEGROUP which gets created is PRIMARY file group, so by default Primary filegroup is the default filegroup.

Any database objects user create will stores in PRIMARY file group by default, but one can change the default filegroup. Even at the time of object creation user can specify where the object will get stored.

For our example let’s create a database “DemoFileGroup”

use master

create database DemoFileGroup

Command(s) completed successfully.

Now we are creating two filegroup and attaching it to the newly created database “DemoFileGroup” as fg1 and fg2.

alter database DemoFileGroup add FILEGROUP fg1

alter database DemoFileGroup add FILEGROUP fg2

Command(s) completed successfully.

Now let’s query sysfilegroups to find out, various filesgroups gets created in the database.

use DemoFileGroup

select groupName as FileGroupName from sysfilegroups

FileGroupName

-----------------------------

PRIMARY

fg1

fg2

(3 row(s) affected)

At this point we have three file groups primary, FG1 and FG2 in our new database. Any new table or index will be created in the primary filegroup which is the default unless we specify the filegroup to use.

If we want to create a data object in fg1 or fg2 then the only way is to specify the filegroup when creating the storage object as we are doing in the below example.

Here we are creating table table1 but we are not specifying on which filegroup it should get created, so by default it will get created in the PRIMARY file group.

create table table1

(id int identity(1,1))

Lets query sp_help with table1 to find out the where the table object “table1” gets its storage.

sp_help table1

Data_located_on_filegroup

---------------------------

PRIMARY

So by output we can see the table gets created in the default file group ie, PRIMARY filegroup.

However at the time of table creation we can specify the filegroup name as we are doing in the below example.

create table table2

(id int identity(1,1),

fname varchar(20))

on fg1

Command(s) completed successfully.

Now let’s query sp_help with table1 to find out the where the table object “table2” gets its storage.

sp_help table2

Data_located_on_filegroup

---------------------------

fg1

Here we can see the table gets created in the file group fg1 which we specified at the time of table creation.

Now that we created the table on the new filegroup we can try to insert data.

insert into table2(fname) values('Prachi')

Msg 622, Level 16, State 3, Line 1

The filegroup "fg1" has no files assigned to it. Tables, indexes,

text columns, ntext columns, and image columns cannot be populated

on this filegroup until a file is added.

Now since we have no file associated with File Group fg1 we got the above error. Even if we try change the default filegroup, we will be getting the same error.

alter database DemoFileGroup modify FILEGROUP fg1 default

Msg 5050, Level 16, State 1, Line 1

Cannot change the properties of empty filegroup 'fg1'.

The filegroup must contain at least one file.

At this moment, we are done with FileGroup creation, its now time to create file and attached it to the newly created filegroup. Lets see the below script to create and attach file to the newly created filegroup “fg1”.

-- Add data file to FG1

alter database DemoFileGroup

add file

(NAME = DemoFileGroup_fg1,FILENAME = 'D:\Sqldata\DemoFileGroup_fg1.ndf')

to FILEGROUP fg1

Command(s) completed successfully.

Now we can set the default FILEGROUP to fg1, since now it has file associated with it.

alter database DemoFileGroup modify FILEGROUP fg1 default

The filegroup property 'DEFAULT' has been set.

We can check this by going to the database properties

Now if you try to insert data to table 2 which is associated by FILEGROUP fg2, it will not throw error.

insert into table2(fname) values('Prachi')

(1 row(s) affected)

Here we can can conclude, we are now known with the filegroup and file creation. In coming post, I will explain various benefits and uses of filegroup with Horizontal Partition.

Dare to Challenge code, challenge youself

Post Reference: Vikram Aristocratic Elfin Share

Aristocratic Elfin Share

Pages

About Me

Search This Blog

Tuesday, December 16, 2014

Partitioning Large table

Sunday, July 7, 2013

Selecting Top N Records Group wise

Thursday, April 25, 2013

Spread storage of data and indexes to different disks to improve I/O performance