hive函数,
内置运算符
Hive官方文档
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
你可以将hive当成mysql,因为绝大部分mysql的函数,hive都有
Hive自定义函数和Transform
当Hive提供的内置函数无法满足你的业务处理需要时,此时就可以考虑使用用户自定义函数 UDF( user-defined function)。
自定义函数类别
UDF开发实例
1代码
import org.apache.hadoop.hive.ql.exec.UDF;
public class ToLowerCase extends UDF {
// 必须是public
public String evaluate(String field) {
String result = field.toLowerCase();
return result;
}
}
2、打成jar包上传到服务器
3、将jar包添加到hive的classpath
hive> add JAR /root/udf.jar;
Added [/root/udf.jar] to class path
Added resources: [/root/udf.jar]
4创建临时函数与开发好的java class关联
hive> create temporary function toprovince as 'cn.feizhou.udf.ToLowerCase';
测试
测试数据1.txt
1,A
2,B
3,C
4,d
7,y
8,u
create table ttt_t3(id int,name string) row format delimited fields terminated by ',' stored as textfile;
测试结果
hive> select id,name,toprovince(name) from ttt_t3;
OK
1 A a
2 B b
3 C c
4 d d
7 y y
8 u u
Transform实现
Hive的 TRANSFORM 关键字提供了在SQL中调用自写脚本的功能
适合实现Hive中没有的功能又不想写UDF的情况
案例
[root@HAT01 ~]# cat 6.txt
1,11,978300760,444
2,22,978300760,444
创建表加载数据
hive> create table ttt_t7(movieid string,rate int,timestring string,uid string)row format delimited fields terminated by ',';
hive> load data local inpath '/root/6.txt' overwrite into table ttt_t7;
hive> select * from ttt_t7;
1 11 978300760 444
2 22 978300760 444
hive>
创建py
vim weekday_mapper.py
#!/bin/python
import sys
import datetimefor line in sys.stdin:
line = line.strip()
movieid, rating, unixtime,userid = line.split('\t')
weekday = datetime.datetime.fromtimestamp(float(unixtime)).isoweekday()
print '\t'.join([movieid, rating, str(weekday),userid])
将py添加到hive的classpath
hive> add FILE /root/weekday_mapper.py;
Added resources: [/root/weekday_mapper.py]
结果
hive> SELECT TRANSFORM (movieid, rate, timestring,uid) USING 'python weekday_mapper.py' FROM ttt_t7;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.
Query ID = root_20190309144803_fa0b3901-21fb-404e-a95e-660362bb41a9
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1552027080460_0009, Tracking URL = http://HAT02:8088/proxy/application_1552027080460_0009/
Kill Command = /home/hadoop/app/hadoop-2.6.4/bin/hadoop job -kill job_1552027080460_0009
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-03-09 14:49:14,142 Stage-1 map = 0%, reduce = 0%
2019-03-09 14:50:02,816 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.24 sec
MapReduce Total cumulative CPU time: 1 seconds 240 msec
Ended Job = job_1552027080460_0009
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 1.24 sec HDFS Read: 4508 HDFS Write: 133 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 240 msec
OK
1 11 1 444
2 22 1 444
Time taken: 120.21 seconds, Fetched: 2 row(s)
Hive实战案例 ——级联求和
实现步骤
可以用一个hql语句即可实现:
select A.username,A.month,max(A.salary) as salary,sum(B.salary) as accumulate
from
(select username,month,sum(salary) as salary from t_access_times group by username,month) A
inner join
(select username,month,sum(salary) as salary from t_access_times group by username,month) B
on
A.username=B.username
where B.month <= A.month
group by A.username,A.month
order by A.username,A.month;
还有一种方法,关键核心
where B.month = A.month-1