Hive窗口函数,
原始表数据:
结果如下:
hive> select
> id,
> name,
> score,
> row_number() over(partition by name order by score desc) as r1,
> dense_rank() over(partition by name order by score desc) as d1,
> rank() over(partition by name order by score desc) as r2
> from top_tmp;
Query ID = nwom_20181128095353_09418b0c-77dc-4332-95a4-810c49d01ac7
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1536918059760_23105, Tracking URL = http://gxmaster01:8088/proxy/application_1536918059760_23105/
Kill Command = /opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45/lib/hadoop/bin/hadoop job -kill job_1536918059760_23105
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
2018-11-28 11:00:44,837 Stage-1 map = 0%, reduce = 0%
2018-11-28 11:00:51,086 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 1.24 sec
2018-11-28 11:00:55,253 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.66 sec
2018-11-28 11:01:01,497 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.8 sec
MapReduce Total cumulative CPU time: 5 seconds 800 msec
Ended Job = job_1536918059760_23105
MapReduce Jobs Launched:
Stage-Stage-1: Map: 2 Reduce: 1 Cumulative CPU: 5.8 sec HDFS Read: 12325 HDFS Write: 403 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 800 msec
OK
2 a 32 1 1 1
2 a 32 2 1 1
2 a 32 3 1 1
2 a 27 4 2 4
2 a 27 5 2 4
2 a 27 6 2 4
2 a 20 7 3 7
3 b 26 1 1 1
3 b 26 2 1 1
3 b 26 3 1 1
3 b 23 4 2 4
3 b 19 5 3 5
3 b 19 6 3 5
3 b 19 7 3 5
3 b 16 8 4 8
4 e 75 1 1 1
4 e 71 2 2 2
4 e 71 3 2 2
4 e 71 4 2 2
4 e 27 5 3 5
4 e 27 6 3 5
4 e 27 7 3 5
4 e 22 8 4 8
5 f 56 1 1 1
5 f 50 2 2 2
5 f 50 3 2 2
5 f 50 4 2 2
5 f 32 5 3 5
5 f 30 6 4 6
5 f 30 7 4 6
5 f 30 8 4 6
可以发现:
row_number over(partition by 字段) 会正常依次排序
dense_rank() over(partition by) 排序的同时会重复,就是排名相同的名次都是一样的
rank over(partition by) 排序的同时会重复,就是排名相同的名次都是一样的,后面的会发生跳跃