Hive语法,
1.select语法
SELECT [ALL | DISTINCT] select_expr, select_expr, ...
FROM table_reference
[WHERE where_condition]
[GROUP BY col_list]
[CLUSTER BY col_list
| [DISTRIBUTE BY col_list] [SORT BY col_list]
]
[LIMIT number]
where:查询存在的记录
SELECT * FROM sales WHERE amount > 10 AND region = "US"
DISTINCT:指定从结果集移除重复的行
SELECT DISTINCT col1, col2 FROM t1on:表连接区域查询(没有连接,就用where)
SELECT page_views.*
FROM page_views JOIN dim_users
ON (page_views.user_id = dim_users.id AND page_views.date >= '2008-03-01' AND page_views.date <= '2008-03-31')
HAVING:聚合函数,筛选成组后的各组数据
SELECT col1 FROM t1 GROUP BY col1 HAVING SUM(col2) > 10;
等于
SELECT col1 FROM (SELECT col1, SUM(col2) AS col2sum FROM t1 GROUP BY col1) t2 WHERE t2.col2sum > 10;limit:限制记录个数SET mapred.reduce.tasks = 1
SELECT * FROM sales SORT BY amount DESC LIMIT 5正则表达式:
使用java正则表达式语法:常用http://www.fileformat.info/tool/regex.htm测试结果。SELECT `(ds|hr)?+.+` FROM sales
2.Group by 语法
用于查询或聚合的输出,同时可以做多个聚合操作,但是,不能有两个聚合操作有不同的DISTINCT列
对:
INSERT OVERWRITE TABLE pv_gender_agg
SELECT pv_users.gender, count(DISTINCT pv_users.userid), count(*), sum(DISTINCT pv_users.userid)
FROM pv_users
GROUP BY pv_users.gender;
错:
INSERT OVERWRITE TABLE pv_gender_agg
SELECT pv_users.gender, count(DISTINCT pv_users.userid), count(DISTINCT pv_users.ip)
FROM pv_users
GROUP BY pv_users.gender;不允许在同一个查询,有多个DISTINCT表达式当使用group by字句,select语句,只能包含group by包含的列。当然,在select语句,可以有多个聚合函数(例如count)
对:
select a,count(b)from group by a;
错:
select a,b from group by a;原因:如果group的key是a,那么当group
a=100时,b的值有多种可能的选项。hive摒弃了这种猜测无效的SQL(HQL,要准确):即有一列在select子句中,却不包含在GROUP BY子句中。也可以直接发送到hdfs文件:
INSERT OVERWRITE DIRECTORY '/user/facebook/tmp/pv_age_sum'
SELECT pv_users.age, count(DISTINCT pv_users.userid)
GROUP BY pv_users.age;案例:
为将各个字段按照每个小时的num总数进行统计
select gid,sid,user,roleid,collect_set(time)[0],status,map_id,sum(num) from test group by gid,sid,user,roleid,substr(from_unixtime(time,'yyyyMMddHHmmss'),9,2),status,map_id;
collect_set:
返回去重的元素数组。在不能使用multi-distinct时,可以代替。
本站文章为和通数据库网友分享或者投稿,欢迎任何形式的转载,但请务必注明出处.
同时文章内容如有侵犯了您的权益,请联系QQ:970679559,我们会在尽快处理。