I found this nice post from Laurent Schneider the other day and wanted to comment, but my comments were a bit more of a digression and discussion, so I’ve blogged it and put a link on the comments of the post by Laurent.
I’d always used the ROW_NUMBER method myself until I read this and then figured I’d try the KEEP DENSE_RANK method, which works, as Laurent describes. One thing that didn’t sit well with me in the post from Laurent was that he said “the second one should be more performant” – I prefer hard facts, so I decided to test it a bit and my results are below.
In the simple example that Laurent gave, it’s difficult to tell which is quickest, since the table in question only has a handful of rows and therefore any benchmarking is more susceptible to other influences, clouding the results. I figured I’d build a larger table and try it on that.
Before I did that though, I did get the plans from the two statements Laurent ran on the EMP table and both show the same resource costings:
Firstly, for the ROW_NUMBER method:
select ename , deptno , sal from (select ename , deptno , sal , row_number() over (partition by deptno order by sal desc,empno) r from emp ) where r=1; Plan hash value: 3291446077 --------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 14 | 644 | 4 (25)| 00:00:01 | |* 1 | VIEW | | 14 | 644 | 4 (25)| 00:00:01 | |* 2 | WINDOW SORT PUSHED RANK| | 14 | 644 | 4 (25)| 00:00:01 | | 3 | TABLE ACCESS FULL | EMP | 14 | 644 | 3 (0)| 00:00:01 | --------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("R"=1) 2 - filter(ROW_NUMBER() OVER ( PARTITION BY "DEPTNO" ORDER BY INTERNAL_FUNCTION("SAL") DESC ,"EMPNO")<=1) Note ----- - dynamic sampling used for this statement
Now, the KEEP DENSE_RANK method:
select max(ename) keep (dense_rank first order by sal desc,empno) ename , deptno , max(sal) sal from emp group by deptno; Plan hash value: 15469362 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 14 | 644 | 4 (25)| 00:00:01 | | 1 | SORT GROUP BY | | 14 | 644 | 4 (25)| 00:00:01 | | 2 | TABLE ACCESS FULL| EMP | 14 | 644 | 3 (0)| 00:00:01 | --------------------------------------------------------------------------- Note ----- - dynamic sampling used for this statement
So, the plans are very similar, especially in terms of having the same resource usage…which means they should be similar in terms of performance…running them, as Laurent did, shows around 1s response times, which, as I say, doesn’t conclusively prove which method is quickest.
OK, on to a bigger example then…
I basically created a similar table to EMP, called JEFF_EMP and added a few more columns (for later) and then put ten million rows in it, taking around 1.3GB on my system…plenty to churn through.
DROP TABLE jeff_emp PURGE / CREATE TABLE jeff_emp(deptno NUMBER ,ename VARCHAR2(100) ,first_name VARCHAR2(50) ,initials VARCHAR2(30) ,surname VARCHAR2(50) ,sal NUMBER ,empno NUMBER ) / INSERT INTO jeff_emp(deptno,ename,first_name,initials,surname,sal,empno) SELECT (MOD(ROWNUM,3) + 1) * 10 , 'FIRSTNAME_'||TO_CHAR(ROWNUM)||'_INITIALS_'||TO_CHAR(ROWNUM)||'_SURNAME_'||TO_CHAR(ROWNUM) , 'FIRSTNAME_'||TO_CHAR(ROWNUM) , 'INITIALS_'||TO_CHAR(ROWNUM) , 'SURNAME_'||TO_CHAR(ROWNUM) , ROWNUM * 100 , ROWNUM FROM (SELECT LEVEL l FROM dual CONNECT BY LEVEL < 10000001) ORDER BY l / COMMIT / EXEC dbms_stats.gather_table_stats(ownname => USER, tabname => 'JEFF_EMP',estimate_percent=>10);
Now, here is the plan for the ROW_NUMBER method:
--------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | --------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10M| 868M| | 204K (1)| 00:40:49 | |* 1 | VIEW | | 10M| 868M| | 204K (1)| 00:40:49 | |* 2 | WINDOW SORT PUSHED RANK| | 10M| 629M| 1533M| 204K (1)| 00:40:49 | | 3 | TABLE ACCESS FULL | JEFF_EMP | 10M| 629M| | 46605 (1)| 00:09:20 | ---------------------------------------------------------------------------------------------
…and the results:
You must have heard the saying “God is late but never too late.” Jeff and Leena had a child at 60, really a miracle, but it is true. generic line viagra You cipla generic cialis will be able to enjoy your sex encounter, Generic Ciallis is your companion. This will give cipla tadalafil 20mg both of you a boost as well as treat the premature ejaculation and erectile dysfunctions. Os it has nothing to http://robertrobb.com/2018/06/ generic viagra online with the required quantity.
ENAME DEPTNO SAL ---------------------------------------------------------------------------------------------------- ---------- ---------- FIRSTNAME_9999999_INITIALS_9999999_SURNAME_9999999 10 999999900 FIRSTNAME_10000000_INITIALS_10000000_SURNAME_10000000 20 1000000000 FIRSTNAME_9999998_INITIALS_9999998_SURNAME_9999998 30 999999800 Elapsed: 00:00:24.47
…and the KEEP DENSE_RANK method plan:
------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3 | 198 | 47109 (2)| 00:09:26 | | 1 | SORT GROUP BY | | 3 | 198 | 47109 (2)| 00:09:26 | | 2 | TABLE ACCESS FULL| JEFF_EMP | 10M| 629M| 46605 (1)| 00:09:20 | -------------------------------------------------------------------------------
…and it’s results:
ENAME DEPTNO SAL ---------------------------------------------------------------------------------------------------- ---------- ---------- FIRSTNAME_9999999_INITIALS_9999999_SURNAME_9999999 10 999999900 FIRSTNAME_10000000_INITIALS_10000000_SURNAME_10000000 20 1000000000 FIRSTNAME_9999998_INITIALS_9999998_SURNAME_9999998 30 999999800 Elapsed: 00:00:07.76
So, reasonably clear results, indicating that the KEEP DENSE_RANK is about a third of the time to run, compared to the ROW_NUMBER method. You can also see from the plans that the ROW_NUMBER method involves use of TEMP, whereas the KEEP DENSE_RANK doesn’t, hence the slowdown.
So, Laurent was correct in his assertion that it should be more performant…but it’s nice to see the results based on a more meaningful set of data.
Now, there was one other thing that concerned me, and that was whether if you added more columns into the SQL, would it change the performance fo either method to any significant degree, so I started using the extra name columns like this:
SELECT ename , first_name , initials , surname , deptno , sal FROM (SELECT ename , first_name , initials , surname , deptno , sal , ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY sal DESC,empno) r FROM jeff_emp ) WHERE r = 1 /
…which has a plan of:
--------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | --------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10M| 1546M| | 307K (1)| 01:01:36 | |* 1 | VIEW | | 10M| 1546M| | 307K (1)| 01:01:36 | |* 2 | WINDOW SORT PUSHED RANK| | 10M| 1107M| 2606M| 307K (1)| 01:01:36 | | 3 | TABLE ACCESS FULL | JEFF_EMP | 10M| 1107M| | 46605 (1)| 00:09:20 | ---------------------------------------------------------------------------------------------
…and results:
ENAME FIRST_NAME INITIALS SURNAME DEPTNO SAL ---------------------------------------------------------------------------------------------------- -------------------------------------------------- ------------------------------ -------------------------------------------------- ---------- ---------- FIRSTNAME_9999999_INITIALS_9999999_SURNAME_9999999 FIRSTNAME_9999999 INITIALS_9999999 SURNAME_9999999 10 999999900 FIRSTNAME_10000000_INITIALS_10000000_SURNAME_10000000 FIRSTNAME_10000000 INITIALS_10000000 SURNAME_10000000 20 1000000000 FIRSTNAME_9999998_INITIALS_9999998_SURNAME_9999998 FIRSTNAME_9999998 INITIALS_9999998 SURNAME_9999998 30 999999800 Elapsed: 00:00:25.76
For the KEEP DENSE_RANK I get:
SELECT MAX(ename) KEEP (DENSE_RANK FIRST ORDER BY sal DESC,empno) ename , MAX(first_name) KEEP (DENSE_RANK FIRST ORDER BY sal DESC,empno) first_name , MAX(initials) KEEP (DENSE_RANK FIRST ORDER BY sal DESC,empno) initials , MAX(surname) KEEP (DENSE_RANK FIRST ORDER BY sal DESC,empno) surname , deptno , MAX(sal) sal FROM jeff_emp GROUP BY deptno /
Which has the following plan:
------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3 | 348 | 47109 (2)| 00:09:26 | | 1 | SORT GROUP BY | | 3 | 348 | 47109 (2)| 00:09:26 | | 2 | TABLE ACCESS FULL| JEFF_EMP | 10M| 1107M| 46605 (1)| 00:09:20 | -------------------------------------------------------------------------------
…and results:
ENAME FIRST_NAME INITIALS SURNAME DEPTNO SAL ---------------------------------------------------------------------------------------------------- -------------------------------------------------- ------------------------------ -------------------------------------------------- ---------- ---------- FIRSTNAME_9999999_INITIALS_9999999_SURNAME_9999999 FIRSTNAME_9999999 INITIALS_9999999 SURNAME_9999999 10 999999900 FIRSTNAME_10000000_INITIALS_10000000_SURNAME_10000000 FIRSTNAME_10000000 INITIALS_10000000 SURNAME_10000000 20 1000000000 FIRSTNAME_9999998_INITIALS_9999998_SURNAME_9999998 FIRSTNAME_9999998 INITIALS_9999998 SURNAME_9999998 30 999999800 Elapsed: 00:00:14.56
So, the differential in performance has reduced significantly, with the KEEP DENSE_RANK around double it’s original time, whilst the ROW_NUMBER method has only increased marginally. I’ve not tested with adding additional columns, but I’m guessing (I know…I could and should test it!) it will get worse, to the extent that, eventually, the KEEP DENSE_RANK will become the worse performer. If that’s the case, then essentially, these two methods have different scalability dynamics and one should bear this in mind when considering which to choose, depending on how many DENSE_RANK’d columns you’d need to deliver your results.
Hope this helps.
I could also note that with a lot of columns, Dense_rank is pretty ugly and error prone…
Thanks for the followup,
Laurent
Absolutely…as always, appropriate use of the technology is always the best way forward.
Good to hear from you.
Cheers
Jeff
[…] comparison between KEEP DENSE_ RANK and ROW_NUMBER Jeff Moss-KEEP DENSE_RANK versus ROW_NUMBER – further details Leave a […]