欢迎投稿

今日深度:

hive函数,

hive函数,


内置运算符

Hive官方文档

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

你可以将hive当成mysql,因为绝大部分mysql的函数,hive都有

Hive自定义函数和Transform

当Hive提供的内置函数无法满足你的业务处理需要时,此时就可以考虑使用用户自定义函数 UDF( user-defined function)。

 自定义函数类别

UDF开发实例

1代码

import org.apache.hadoop.hive.ql.exec.UDF;

public class ToLowerCase extends UDF {
	 
	// 必须是public
	public String evaluate(String field) {
		String result = field.toLowerCase();
		return result;
	}


}

2、打成jar包上传到服务器

3、将jar包添加到hive的classpath

hive> add JAR /root/udf.jar;
Added [/root/udf.jar] to class path
Added resources: [/root/udf.jar]

4创建临时函数与开发好的java class关联

hive> create temporary function toprovince as 'cn.feizhou.udf.ToLowerCase';

测试

测试数据1.txt

1,A
2,B
3,C
4,d
7,y
8,u

create table ttt_t3(id int,name string) row format delimited fields terminated by ','  stored as textfile;

测试结果

hive> select id,name,toprovince(name) from ttt_t3;
OK
1    A    a
2    B    b
3    C    c
4    d    d
7    y    y
8    u    u

Transform实现

Hive的 TRANSFORM 关键字提供了在SQL中调用自写脚本的功能

适合实现Hive中没有的功能又不想写UDF的情况

案例

[root@HAT01 ~]# cat 6.txt
1,11,978300760,444
2,22,978300760,444

创建表加载数据

hive> create table ttt_t7(movieid string,rate int,timestring string,uid string)row format delimited fields terminated by ',';
 
hive> load data local inpath '/root/6.txt' overwrite into table ttt_t7;
 
hive> select * from ttt_t7;
 
1    11    978300760    444
2    22    978300760    444
 
hive> 

创建py

vim weekday_mapper.py

#!/bin/python
import sys
import datetime

for line in sys.stdin:
  line = line.strip()
  movieid, rating, unixtime,userid = line.split('\t')
  weekday = datetime.datetime.fromtimestamp(float(unixtime)).isoweekday()
  print '\t'.join([movieid, rating, str(weekday),userid])

 

 

将py添加到hive的classpath

 hive> add FILE /root/weekday_mapper.py;
Added resources: [/root/weekday_mapper.py]

结果

hive> SELECT  TRANSFORM (movieid, rate, timestring,uid) USING 'python weekday_mapper.py'  FROM ttt_t7;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.
Query ID = root_20190309144803_fa0b3901-21fb-404e-a95e-660362bb41a9
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1552027080460_0009, Tracking URL = http://HAT02:8088/proxy/application_1552027080460_0009/
Kill Command = /home/hadoop/app/hadoop-2.6.4/bin/hadoop job  -kill job_1552027080460_0009
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-03-09 14:49:14,142 Stage-1 map = 0%,  reduce = 0%
2019-03-09 14:50:02,816 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.24 sec
MapReduce Total cumulative CPU time: 1 seconds 240 msec
Ended Job = job_1552027080460_0009
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.24 sec   HDFS Read: 4508 HDFS Write: 133 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 240 msec
OK
1    11    1    444
2    22    1    444

Time taken: 120.21 seconds, Fetched: 2 row(s)

 

Hive实战案例 ——级联求和

 

实现步骤

可以用一个hql语句即可实现:

select A.username,A.month,max(A.salary) as salary,sum(B.salary) as accumulate
from 
(select username,month,sum(salary) as salary from t_access_times group by username,month) A 
inner join 
(select username,month,sum(salary) as salary from t_access_times group by username,month) B
on
A.username=B.username
where B.month <= A.month
group by A.username,A.month
order by A.username,A.month;

还有一种方法,关键核心

where B.month  = A.month-1

www.htsjk.Com true http://www.htsjk.com/hive/39109.html NewsArticle hive函数, 内置运算符 Hive 官方文档 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF 你可以将hive当成mysql,因为绝大部分mysql的函数,hive都有 Hive自定义函数和 Transform 当Hive提供的内...
相关文章
    暂无相关文章
评论暂时关闭