hive row_number(),
问题背景:想对字段A分组,然后根据字段B排序
最笨的方法:
SELECT b.*
FROM (
SELECT
A
, MAX(B) AS B
FROM mytable
WHERE
SOME CONDITIONS
GROUP BY A
) a
JOIN (
SELECT
A
, C
, B
FROM mytable
WHERE
SOME CONDITIONS
GROUP BY A, B, C
)b
ON a.B = b.B AND a.A = b.A
;后来学会了row_num(),是hive自带的函数,它的作用是按指定的列进行分组生成行序列。在ROW_NUMBER(a,b) 时,若两条记录的a,b列相同,则行序列+1,否则重新计数。
一开始还是写出来很笨很丑的代码:
SELECT *, rank
FROM(
SELECT
*
, row_number() over (partition by A order by B desc ) rank
FROM (
SELECT
A
, C
, B
FROM mytable
WHERE
SOME CONDITIONS
GROUP BY
A
, C
, B
) tb_1
) tb_2
WHERE rank < 2其实正确的应该是如下方法:
select * from(
SELECT
A
, C
, B
, row_number() over (partition by A order by B desc ) rank
FROM mytable
WHERE
SOME CONDITION
) a where a.rank=1
本站文章为和通数据库网友分享或者投稿,欢迎任何形式的转载,但请务必注明出处.
同时文章内容如有侵犯了您的权益,请联系QQ:970679559,我们会在尽快处理。