欢迎投稿

今日深度:

hive row_number(),

hive row_number(),


问题背景:想对字段A分组,然后根据字段B排序

最笨的方法:

SELECT b.*
FROM (
  SELECT 
    A
  , MAX(B) AS B
  FROM mytable
  WHERE 
     SOME CONDITIONS
  GROUP BY  A
) a
JOIN (
  SELECT 
      A
    , C
    , B
  FROM mytable
  WHERE 
      SOME CONDITIONS
    GROUP BY A, B, C
)b 
ON a.B = b.B AND a.A = b.A
;

后来学会了row_num(),是hive自带的函数,它的作用是按指定的列进行分组生成行序列。在ROW_NUMBER(a,b) 时,若两条记录的a,b列相同,则行序列+1,否则重新计数。

一开始还是写出来很笨很丑的代码:

SELECT *, rank
FROM(
  SELECT 
    *
  , row_number() over (partition by A order by B desc ) rank
  FROM (
    SELECT 
        A
      , C
      , B
    FROM mytable
    WHERE 
        SOME CONDITIONS
    GROUP BY 
      A
    , C
    , B
  ) tb_1
) tb_2
WHERE rank < 2

其实正确的应该是如下方法:

select * from(
 	SELECT  
              A
            , C
            , B
            , row_number() over (partition by A order by B desc ) rank
    FROM mytable
    WHERE 
        SOME CONDITION
 ) a where a.rank=1


www.htsjk.Com true http://www.htsjk.com/hive/35912.html NewsArticle hive row_number(), 问题背景:想对字段A分组,然后根据字段B排序 最笨的方法: SELECT b.*FROM ( SELECT A , MAX(B) AS B FROM mytable WHERE SOME CONDITIONS GROUP BY A) aJOIN ( SELECT A , C , B FROM mytable WHERE SOME C...
相关文章
    暂无相关文章
评论暂时关闭