About Me

My photo
Mumbai, Maharastra, India
He has more than 7.6 years of experience in the software development. He has spent most of the times in web/desktop application development. He has sound knowledge in various database concepts. You can reach him at viki.keshari@gmail.com https://www.linkedin.com/in/vikrammahapatra/ https://twitter.com/VikramMahapatra http://www.facebook.com/viki.keshari

Search This Blog

Showing posts with label Group By. Show all posts
Showing posts with label Group By. Show all posts

Sunday, June 28, 2015

Querying employee salary along with maximum salary of his department


I found a very interesting question from one of the user of SQL group, his problem statement was

“I need to find each employee salary along with the max salary of his department with a single line query”

It is very easy to find using SQL ‘Over by with Partition By clause’

Lets demonstrate it with an example. Here I am creating Emp and Dept table and inserting few records in it.

create table Dept1
(dept_id int identity(1,1) primary key,
dname varchar(20))
Command(s) completed successfully.

create table Emp2
(emp_id int identity(1,1) primary key,
emp_name varchar(20),
dept_id int references Dept1(dept_id),
salary float)
Command(s) completed successfully.

insert into Dept1
select 'HR' union all
select 'IT' union all
select 'Account'

insert into Emp2
select 'Sudeepta', 1, 1000 union all
select 'Susmita', 1, 3000 union all
select 'Smruti', 3, 10000 union all
select 'Richa', 3, 8000 union all
select 'Sanjana', 1, 4000 union all
select 'Sweta', 2,4000 union all
select 'Deepa', 2,7000

Now since the data are there in the table, lets write query to find each employee salary along with the maximum salary for his department.

select e.emp_name, d.dname,e.salary, max(e.salary) over (partition by d.dept_id) MaxSalary from
Emp2 e inner join Dept1 d
on e.dept_id = d.dept_id

emp_name             dname                salary                 MaxSalary
-------------------- -------------------- ---------------------- ----------
Sudeepta             HR                   1000                   4000
Susmita              HR                   3000                   4000
Sanjana              HR                   4000                   4000
Sweta                IT                   4000                   7000
Deepa                IT                   7000                   7000
Smruti               Account              10000                  10000
Richa                Account              8000                   10000

(7 row(s) affected)

So here we can from the output we have salary for each employee along with the maximum salary of his department. We have simply used aggregated function max along with ‘over with partition by clause’ which is taking d.dept_id to partition the record to find the max salary of his department.

I can’t sleep …until I die…My passion SQL :)

Post Reference: Vikram Aristocratic Elfin Share

Monday, March 2, 2015

A simple but tactical enough, Power of Group By - I


Today my mate draw closer to me for a straight forward script, he had a table which consist of dates in int (yyyymmdd) format, his requisite was to get the latest date of each month.

For example,
Out of 20150123, 20150130, 20150127 the output should be 20150130.

Let’s stimulate the problem by taking an example and lets see how we will approach towards the solution

CREATE table DemoMax
(id int identity(1,1),
mydate int)

insert into DemoMax values (20150226)
insert into DemoMax values (20150131)
insert into DemoMax values (20150329)
insert into DemoMax values (20150429)
insert into DemoMax values (20150530)
insert into DemoMax values (20150626)
insert into DemoMax values (20150726)
insert into DemoMax values (20150826)
insert into DemoMax values(20150226)
insert into DemoMax values(20150227)
insert into DemoMax values(20150328)


Here we have table created with few rows in it.

Our requirement is to get the highest date for each month, for finding so, let’s bifurcate the myDate to yyyy, mm and dd in three separate columns.

SUBSTRING(cast(mydate as varchar),1,4),
SUBSTRING(cast(mydate as varchar),5,2),

Now the second step is to group by the result on the above bifurcated column and applying the max aggregate function to find the largest among all date for a specific yyyy and mm.

max( cast(SUBSTRING(cast(mydate as varchar),7,2) as int) )

Lets formulate the final query

SELECT SUBSTRING(cast(mydate AS VARCHAR), 1, 4) as YYYY
       ,SUBSTRING(cast(mydate AS VARCHAR), 5, 2) as MM
       ,max(cast(SUBSTRING(cast(mydate AS VARCHAR), 7, 2) AS INT)) as LatestDate
FROM DemoMax
GROUP BY SUBSTRING(cast(mydate AS VARCHAR), 1, 4)
       ,SUBSTRING(cast(mydate AS VARCHAR), 5, 2)

YYYY MM   LatestDate
---- ---- -----------
2015 01   31
2015 02   27
2015 03   29
2015 04   29
2015 05   30
2015 06   26
2015 07   26
2015 08   26

(8 row(s) affected)

Reformulate the query to get the outout in yyyymmdd format,

with LatestDateCTE
as
(
SELECT SUBSTRING(cast(mydate AS VARCHAR), 1, 4) as YYYY
       ,SUBSTRING(cast(mydate AS VARCHAR), 5, 2) as MM
       ,max(cast(SUBSTRING(cast(mydate AS VARCHAR), 7, 2) AS INT)) as LatestDate
FROM DemoMax
GROUP BY SUBSTRING(cast(mydate AS VARCHAR), 1, 4)
       ,SUBSTRING(cast(mydate AS VARCHAR), 5, 2)
)

select (YYYY + MM + LatestDate) as MyDate from LatestDateCTE

MyDate
-----------
201532
201529
201532
201533
201535
201532
201533
201534

(8 row(s) affected)
 
A cup of tea with SQL Server is the greatest combination :)


Post Reference: Vikram Aristocratic Elfin Share

Friday, February 13, 2015

Myth, Group by always result in sorting the table on Group by Column


There are lot threads on various forum discussing automatic sorting behaviour of Group By clause, lets put extra mile to prove this myth incorrect.

So here is my query, I am running this dummy query on AdventureWorks2012 database

select  pc.Name, sum( psc.ProductSubcategoryID) as 'Sum of Sub Product Id' from Production.ProductCategory pc
inner join Production.ProductSubcategory psc
on pc.ProductCategoryID = psc.ProductCategoryID
group by pc.Name

Name                                               Sum of Sub Product Id
-------------------------------------------------- ---------------------
Accessories                                        378
Bikes                                              6
Clothing                                           172
Components                                         147

Let’s see from the prospective of execution plan,



From execution plan we can see the Group By physical operator is replaced by Stream Aggregate logical operator. And since Stream Operator always want its input to be in sorted order, we can see an existence of Sort operator in the execution plan which is sorting the ProductCategory data on ProductCategory.Name wise. And this is the reason why we get our result in sorted order when we use Group by clause our query.

Now just think about a situation where instead of Stream Aggregate, optimizer feels to use Hash Aggregate, which don’t require its input to be in sorted order.

Let’s try to bring Hash Operator in the plan in place of Stream Operator.

I am not able to stimulate the situation where Hash Aggregate appears in plan. So lets play a tricky game with Optimizer, and tell there is no such Stream aggregate operator present thus don’t make it use for creating plan

You can disable Stream Aggregate operator by

DBCC TRACEON (3604);
DBCC RULEOFF('GbAggToStrm');
GO

But later after your operation don’t forget to Rule on for Stream Aggregate operator.
DBCC TRACEON (3604);
DBCC RULEOFF('GbAggToStrm');
GO
select  pc.Name, sum( psc.ProductSubcategoryID) as 'Sum of Sub Product Id' from Production.ProductCategory pc
inner join Production.ProductSubcategory psc
on pc.ProductCategoryID = psc.ProductCategoryID
group by pc.Name
OPTION (RECOMPILE);
GO
DBCC RULEON('GbAggToStrm');

Name                                               Sum of Sub Product Id
-------------------------------------------------- ---------------------
Bikes                                              6
Clothing                                           172
Accessories                                        378
Components                                         147

Here we can see the result is didn’t get sorted by optimizer, lets see how optimizer executed this query.



We can see the Stream Aggregate operator replaced by Hash Aggregate which don’t require its input to be in sorted manner and that is the reason why our result resulted in unsorted manner.

So to be in safe side, if you want your group by data in sorted manner do use order by clause along with group by, this way you can guarantee your data to be in sorted manner.
 
DBCC TRACEON (3604);
DBCC RULEOFF('GbAggToStrm');
GO
select  pc.Name, sum( psc.ProductSubcategoryID) as 'Sum of Sub Product Id' from Production.ProductCategory pc
inner join Production.ProductSubcategory psc
on pc.ProductCategoryID = psc.ProductCategoryID
group by pc.Name
order by pc.Name
OPTION (RECOMPILE);
GO
DBCC RULEON('GbAggToStrm');

Name                                               Sum of Sub Product Id
-------------------------------------------------- ---------------------
Accessories                                        378
Bikes                                              6
Clothing                                           172
Components                                         147

Conclusion: Always use order by along with Group By, if you want your data to be in sorted manner.
 
SQL Server with a Tea glass makes a perfect combination to deal with darkness :)


Post Reference: Vikram Aristocratic Elfin Share