hive函数，

和通数据库htsjk.Com2019-11-02 23:04 来源:未知阅读:7041 评论 220 热度4

标签：

hive函数，

hive函数分为内置函数和自定义函数。

内置函数：

show functions;     查看函数
desc function;      查看用法
1
2

排名函数：（3种）
row_number():没有并列，相同名次按顺序排（同分不同名）
rank():有并列，相同名次空位
dense_rank():有并列，相同名次不空位

班级成绩排名前三的：

create table stu_score(
classId string,
userName string,
score int 
)
row format delimited 
fields terminated by ' '
;

load data local inpath ‘/usr/local/hivedata/score.dat’ into table stu_score;

1、按班级分组，按照学生成绩倒序排列

row_number:
select *,
row_number() over(distribute by classId sort by score desc) rm,
rank() over(distribute by classId sort by score desc) rk,
dense_rank() over(distribute by classId sort by score desc) drk
from stu_score
;
1
2
3
4
5
6
7

取前三名：

select * from 
(
select *,
row_number() over(distribute by classId sort by score desc) rm,
rank() over(distribute by classId sort by score desc) rk,
dense_rank() over(distribute by classId sort by score desc) drk
from stu_score
) tmp
where tmp.rm < 4
;

select * from
(
select *,
row_number() over(partition by classId order by score desc) rm,
rank() over(partition by classId order by score desc) rk,
dense_rank() over(partition by classId order by score desc) drk
from stu_score
) tmp
where tmp.rm < 4
;

over:开窗函数
patition只能用order by,不能用sort by

自定义函数：

为什么要有自定义函数：
hive的内部函数无法满足所有的业务需求，hive提供很多模块的自定义功能，比如：serde、自定义函数、输入输出格式等。
UDF：用户自定义函数，user defined function.一对一的输入输出（常用）
UDAF：用户自定义聚合函数。user defined aggregation function.多对一的输入输出。
UDTF：用户自定义表生成函数。user defined table-generate function.一对多的输入输出

编写UDF的方式：
1、继承UDF，重写evaluate（），允许重载。
2、继承genericUDF，重写initlizer()\getdisplay()\evaluate()

使用：
第一种用法：(只对当前session有效)
1.将编写好的UDF的jar包上传到服务器，并添加带hive的class path中
add jar /root/xxx.jar

add jar /root/gp1813Demo-1.0-SNAPSHOT.jar;
1

2.创建一个自定义的临时函数名

create temporary function myUDF as 'com.zk.xxxx';
1

3.测试
dual

create table dual (id string);
insert into dual values(' ');

select myUDF(‘abc’) from dual;

4.确定没有调用该函数时可以注销函数（小心）

drop temporary function myudf;
1

第一种：（创建永久函数）
1、将编写好的 UDF的jar包上传到服务器，并将jar包上传到hdfs上，并添加到hive的class path中

hdfs dfs -put /root/gp1813Demo-1.0-SNAPSHOT.jar /hiveUDF/
add jar /hiveUDF/gp1813Demo-1.0-SNAPSHOT.jar;
1
2

2、创建一个自定义的函数名：

create function myUpperCase as 'com.qfedu.bigdata.hiveUDF.firstUDF';
1

第二种用法：（相当于临时函数）
1、将编写好的UDF的jar包上传到服务器
2、编写脚本
vi ./hive-init

add jar /root/xxx.jar
create temporary function myUDF as 'com.qf.xxxx';
1
2

3、启动hive的时候带上初始化脚本文件

hive -i ./hive.init
1

第三种用法：(临时函数)
1、将编写好的UDF的jar包上传到服务器
2、在hive的安装目录的conf目录下创建一个文件，文件名为.hiverc,
vi $HIVE_HOME/conf/.hiverc

add jar /root/xxx.jar
ls -al
create temporary function myUDF as 'com.qf.xxxx';
1
2
3

3、直接启动：hive

第四种：
编译源码（费劲）
1)将写好的Java文件拷贝到~/install/hive-0.8.1/src/ql/src/java/org/apache/hadoop/hive/ql/udf/

cd  ~/install/hive-0.8.1/src/ql/src/java/org/apache/hadoop/hive/ql/udf/
ls -lhgt |head
1
2

2)修改
~/install/hive-0.8.1/src/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java，增加import和RegisterUDF

import com.meilishuo.hive.udf.UDFIp2Long;   //添加import
registerUDF("ip2long", UDFIp2Long.class, false); //添加register
1
2

3)在~/install/hive-0.8.1/src下运行ant -Dhadoop.version=1.0.1 package

cd ~/install/hive-0.8.1/src
ant -Dhadoop.version=1.0.1 package
1
2

4)替换exec的jar包，新生成的包在/hive-0.8.1/src/build/ql目录下,替换链接

cp hive-exec-0.8.1.jar /hadoop/hive/lib/hive-exec-0.8.1.jar.0628
rm hive-exec-0.8.1.jar
ln -s hive-exec-0.8.1.jar.0628 hive-exec-0.8.1.jar
1
2
3

5)重启进行测试

案例

案例1：生日转换成年龄
输入：string birthday 1986-07-10
输出：int age 32

import com.google.common.base.Strings;
import org.apache.hadoop.hive.ql.exec.UDF;

import java.util.Calendar;

/**
案例1：生日转换成年龄
输入：string birthday 1986-07-10
输出：int age 32

1、截取字符串
2、分别获取年月日
3、获得当前时间
4、age=当前年-生日年
5、判断月份，当前月份小于生日月份，则age-1
6、判断日期，当前日期小于生日日期，则age-1
*/
public class BirthdayToAge extends UDF {
public int evaluate(String birthday){
//判断输入参数
if(Strings.isNullOrEmpty(birthday)){
return -1;
}
//拆分字符串
String[] birthdays = birthday.split("-");

//获取生日的年月日
int birthYear = Integer.parseInt(birthdays[0]);
int birthMonth = Integer.parseInt(birthdays[1]);
int birthDays = Integer.parseInt(birthdays[2]);

//获取当前时间
Calendar calendar = Calendar.getInstance();
//获取当前年月日
int nowYear = calendar.get(Calendar.YEAR);
int nowMonth = calendar.get(Calendar.MONTH)+1;
int nowDay = calendar.get(Calendar.DAY_OF_MONTH);

//计算年龄
int age = nowYear - birthYear;
if (birthMonth>nowMonth){
age -=1;
}else if (birthMonth==nowMonth&&birthDays>nowDay){
age-=1;
}
return age;
}

public static void main(String[] args) {
System.out.println(new BirthdayToAge().evaluate(“1986-07-26”));
}
}

案例2：根据key值找出value值
如：sex=1&hight=180&weight=130&sal=28000
select func1(“sex=1&hight=180&weight=130&sal=28000”,“weight”) from dual;
130

json格式：
{sex:1,hight:180,weight:130,sal:28000}

import com.google.common.base.Strings;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.json.JSONException;
import org.json.JSONObject;

/**

根据key值找出value值
如：sex=1&hight=180&weight=130&sal=28000
select func1(“sex=1&hight=180&weight=130&sal=28000”,“weight”) from dual;
130
*/
public class KeyToValue extends UDF {
public String evaluate(String str,String key) throws JSONException {
//判断传入参数
if(Strings.isNullOrEmpty(str)){
return null;
}

//将str转换为json格式
String s1 = str.replace("&",",");
String s2 = s1.replace("=","

http://www.htsjk.com/hive/38365.html www.htsjk.Com true http://www.htsjk.com/hive/38365.html NewsArticle hive函数， hive函数分为内置函数和自定义函数。内置函数： show functions; 查看函数desc function; 查看用法 1 2 排名函数：（3种） row_number():没有并列，相同名次按顺序排（同分不同名）...

本站文章为和通数据库网友分享或者投稿，欢迎任何形式的转载，但请务必注明出处.
同时文章内容如有侵犯了您的权益，请联系QQ：970679559，我们会在尽快处理。

返回首页
相关文章

评论暂时关闭