KEEP DENSE_RANK versus ROW_NUMBER – further details

I found this nice post from Laurent Schneider the other day and wanted to comment, but my comments were a bit more of a digression and discussion, so I’ve blogged it and put a link on the comments of the post by Laurent.

I’d always used the ROW_NUMBER method myself until I read this and then figured I’d try the KEEP DENSE_RANK method, which works, as Laurent describes. One thing that didn’t sit well with me in the post from Laurent was that he said “the second one should be more performant” – I prefer hard facts, so I decided to test it a bit and my results are below.

In the simple example that Laurent gave, it’s difficult to tell which is quickest, since the table in question only has a handful of rows and therefore any benchmarking is more susceptible to other influences, clouding the results. I figured I’d build a larger table and try it on that.

Before I did that though, I did get the plans from the two statements Laurent ran on the EMP table and both show the same resource costings:

Firstly, for the ROW_NUMBER method:

select ename
,      deptno
,      sal
from   (select ename
 ,      deptno
 ,      sal
 ,      row_number() over (partition by deptno order by sal desc,empno) r
 from   emp
 )
where  r=1;

Plan hash value: 3291446077                                                                                                             

---------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |    14 |   644 |     4  (25)| 00:00:01 |
|*  1 |  VIEW                    |      |    14 |   644 |     4  (25)| 00:00:01 |
|*  2 |   WINDOW SORT PUSHED RANK|      |    14 |   644 |     4  (25)| 00:00:01 |
|   3 |    TABLE ACCESS FULL     | EMP  |    14 |   644 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

 1 - filter("R"=1)
 2 - filter(ROW_NUMBER() OVER ( PARTITION BY "DEPTNO" ORDER BY
 INTERNAL_FUNCTION("SAL") DESC ,"EMPNO")<=1)

Note
-----
 - dynamic sampling used for this statement

Now, the KEEP DENSE_RANK method:

select max(ename) keep (dense_rank first order by sal desc,empno) ename
,      deptno
,      max(sal) sal
from   emp 
group by deptno;

Plan hash value: 15469362

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    14 |   644 |     4  (25)| 00:00:01 |
|   1 |  SORT GROUP BY     |      |    14 |   644 |     4  (25)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| EMP  |    14 |   644 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement

So, the plans are very similar, especially in terms of having the same resource usage…which means they should be similar in terms of performance…running them, as Laurent did, shows around 1s response times, which, as I say, doesn’t conclusively prove which method is quickest.

OK, on to a bigger example then…

I basically created a similar table to EMP, called JEFF_EMP and added a few more columns (for later) and then put ten million rows in it, taking around 1.3GB on my system…plenty to churn through.

DROP TABLE jeff_emp PURGE
/
CREATE TABLE jeff_emp(deptno     NUMBER
                     ,ename      VARCHAR2(100)
                     ,first_name VARCHAR2(50)
                     ,initials   VARCHAR2(30)
                     ,surname    VARCHAR2(50)
                     ,sal        NUMBER
                     ,empno      NUMBER
                     )
/
INSERT INTO jeff_emp(deptno,ename,first_name,initials,surname,sal,empno)
SELECT (MOD(ROWNUM,3) + 1) * 10
,      'FIRSTNAME_'||TO_CHAR(ROWNUM)||'_INITIALS_'||TO_CHAR(ROWNUM)||'_SURNAME_'||TO_CHAR(ROWNUM)
,      'FIRSTNAME_'||TO_CHAR(ROWNUM)
,      'INITIALS_'||TO_CHAR(ROWNUM)
,      'SURNAME_'||TO_CHAR(ROWNUM)
,      ROWNUM * 100
,      ROWNUM
FROM   (SELECT LEVEL l FROM dual CONNECT BY LEVEL < 10000001) ORDER BY l / COMMIT / EXEC dbms_stats.gather_table_stats(ownname => USER, tabname => 'JEFF_EMP',estimate_percent=>10);

Now, here is the plan for the ROW_NUMBER method:

---------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |    10M|   868M|       |   204K  (1)| 00:40:49 |
|*  1 |  VIEW                    |          |    10M|   868M|       |   204K  (1)| 00:40:49 |
|*  2 |   WINDOW SORT PUSHED RANK|          |    10M|   629M|  1533M|   204K  (1)| 00:40:49 |
|   3 |    TABLE ACCESS FULL     | JEFF_EMP |    10M|   629M|       | 46605   (1)| 00:09:20 |
---------------------------------------------------------------------------------------------

…and the results:
You must have heard the saying “God is late but never too late.” Jeff and Leena had a child at 60, really a miracle, but it is true. generic line viagra You cipla generic cialis will be able to enjoy your sex encounter, Generic Ciallis is your companion. This will give cipla tadalafil 20mg both of you a boost as well as treat the premature ejaculation and erectile dysfunctions. Os it has nothing to http://robertrobb.com/2018/06/ generic viagra online with the required quantity.

ENAME                                                                                                    DEPTNO        SAL
---------------------------------------------------------------------------------------------------- ---------- ----------
FIRSTNAME_9999999_INITIALS_9999999_SURNAME_9999999                                                           10  999999900
FIRSTNAME_10000000_INITIALS_10000000_SURNAME_10000000                                                        20 1000000000
FIRSTNAME_9999998_INITIALS_9999998_SURNAME_9999998                                                           30  999999800

Elapsed: 00:00:24.47

…and the KEEP DENSE_RANK method plan:

-------------------------------------------------------------------------------
| Id  | Operation          | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |          |     3 |   198 | 47109   (2)| 00:09:26 |
|   1 |  SORT GROUP BY     |          |     3 |   198 | 47109   (2)| 00:09:26 |
|   2 |   TABLE ACCESS FULL| JEFF_EMP |    10M|   629M| 46605   (1)| 00:09:20 |
-------------------------------------------------------------------------------

…and it’s results:

ENAME                                                                                                    DEPTNO        SAL
---------------------------------------------------------------------------------------------------- ---------- ----------
FIRSTNAME_9999999_INITIALS_9999999_SURNAME_9999999                                                           10  999999900
FIRSTNAME_10000000_INITIALS_10000000_SURNAME_10000000                                                        20 1000000000
FIRSTNAME_9999998_INITIALS_9999998_SURNAME_9999998                                                           30  999999800

Elapsed: 00:00:07.76

So, reasonably clear results, indicating that the KEEP DENSE_RANK is about a third of the time to run, compared to the ROW_NUMBER method. You can also see from the plans that the ROW_NUMBER method involves use of TEMP, whereas the KEEP DENSE_RANK doesn’t, hence the slowdown.

So, Laurent was correct in his assertion that it should be more performant…but it’s nice to see the results based on a more meaningful set of data.

Now, there was one other thing that concerned me, and that was whether if you added more columns into the SQL, would it change the performance fo either method to any significant degree, so I started using the extra name columns like this:

SELECT ename
,      first_name
,      initials
,      surname
,      deptno
,      sal
FROM   (SELECT ename
        ,      first_name
        ,      initials
        ,      surname
        ,      deptno
        ,      sal
        ,      ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY sal DESC,empno) r 
        FROM   jeff_emp
       )
WHERE  r = 1
/

…which has a plan of:

---------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |    10M|  1546M|       |   307K  (1)| 01:01:36 |
|*  1 |  VIEW                    |          |    10M|  1546M|       |   307K  (1)| 01:01:36 |
|*  2 |   WINDOW SORT PUSHED RANK|          |    10M|  1107M|  2606M|   307K  (1)| 01:01:36 |
|   3 |    TABLE ACCESS FULL     | JEFF_EMP |    10M|  1107M|       | 46605   (1)| 00:09:20 |
---------------------------------------------------------------------------------------------

…and results:

ENAME                                                                                                FIRST_NAME                                         INITIALS                       SURNAME                      DEPTNO         SAL
---------------------------------------------------------------------------------------------------- -------------------------------------------------- ------------------------------ -------------------------------------------------- ---------- ----------
FIRSTNAME_9999999_INITIALS_9999999_SURNAME_9999999                                                   FIRSTNAME_9999999                                  INITIALS_9999999               SURNAME_9999999          10  999999900
FIRSTNAME_10000000_INITIALS_10000000_SURNAME_10000000                                                FIRSTNAME_10000000                                 INITIALS_10000000              SURNAME_10000000         20 1000000000
FIRSTNAME_9999998_INITIALS_9999998_SURNAME_9999998                                                   FIRSTNAME_9999998                                  INITIALS_9999998               SURNAME_9999998          30  999999800

Elapsed: 00:00:25.76

For the KEEP DENSE_RANK I get:

SELECT MAX(ename) KEEP (DENSE_RANK FIRST ORDER BY sal DESC,empno) ename
,      MAX(first_name) KEEP (DENSE_RANK FIRST ORDER BY sal DESC,empno) first_name
,      MAX(initials) KEEP (DENSE_RANK FIRST ORDER BY sal DESC,empno) initials
,      MAX(surname) KEEP (DENSE_RANK FIRST ORDER BY sal DESC,empno) surname
,      deptno
,      MAX(sal) sal
FROM   jeff_emp 
GROUP BY deptno
/

Which has the following plan:

-------------------------------------------------------------------------------
| Id  | Operation          | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |          |     3 |   348 | 47109   (2)| 00:09:26 |
|   1 |  SORT GROUP BY     |          |     3 |   348 | 47109   (2)| 00:09:26 |
|   2 |   TABLE ACCESS FULL| JEFF_EMP |    10M|  1107M| 46605   (1)| 00:09:20 |
-------------------------------------------------------------------------------

…and results:

ENAME                                                                                                FIRST_NAME                                         INITIALS                       SURNAME                      DEPTNO         SAL
---------------------------------------------------------------------------------------------------- -------------------------------------------------- ------------------------------ -------------------------------------------------- ---------- ----------
FIRSTNAME_9999999_INITIALS_9999999_SURNAME_9999999                                                   FIRSTNAME_9999999                                  INITIALS_9999999               SURNAME_9999999          10  999999900
FIRSTNAME_10000000_INITIALS_10000000_SURNAME_10000000                                                FIRSTNAME_10000000                                 INITIALS_10000000              SURNAME_10000000         20 1000000000
FIRSTNAME_9999998_INITIALS_9999998_SURNAME_9999998                                                   FIRSTNAME_9999998                                  INITIALS_9999998               SURNAME_9999998          30  999999800

Elapsed: 00:00:14.56

So, the differential in performance has reduced significantly, with the KEEP DENSE_RANK around double it’s original time, whilst the ROW_NUMBER method has only increased marginally. I’ve not tested with adding additional columns, but I’m guessing (I know…I could and should test it!) it will get worse, to the extent that, eventually, the KEEP DENSE_RANK will become the worse performer. If that’s the case, then essentially, these two methods have different scalability dynamics and one should bear this in mind when considering which to choose, depending on how many DENSE_RANK’d columns you’d need to deliver your results.

Hope this helps.

3 thoughts on “KEEP DENSE_RANK versus ROW_NUMBER – further details”

Laurent Schneider says:

March 13, 2010 at 10:10 pm

I could also note that with a lot of columns, Dense_rank is pretty ugly and error prone…

Thanks for the followup,
Laurent

1. oramoss says:
  
  March 15, 2010 at 7:26 am
  
  Absolutely…as always, appropriate use of the technology is always the best way forward.
  
  Good to hear from you.
  Cheers
  Jeff
  
Blogroll Report 05/03/2010 – 12/03/2010 « Coskan’s Approach to Oracle says:

April 23, 2010 at 1:08 am

[…] comparison between KEEP DENSE_ RANK and ROW_NUMBER Jeff Moss-KEEP DENSE_RANK versus ROW_NUMBER – further details Leave a […]

Oramoss

By Jeff

3 thoughts on “KEEP DENSE_RANK versus ROW_NUMBER – further details”

Leave a Reply Cancel reply