HIVE自定义函数，

和通数据库htsjk.Com2020-02-06 22:52 来源:未知阅读:14816 评论 39 热度4

标签：

HIVE自定义函数，

-》自定义函数

1)创建工程,加载hive的依赖包

2）编写代码，需要继承UDF

3）打包 export jar file

4）双传jar包到linux目录下

5）启动hive

6）add jar jar路径 //不要加引号

add jar /root/lower.jar

7）关联到hive中

create temporary function 自定义函数名 as '包的函数名'

create temporary function lower as "com.alex.udf.func.lower";
OK
Time taken: 0.1 seconds
hive (default)> use month
month( months_between(
hive (default)> use mongdb;
OK
Time taken: 0.031 seconds
hive (mongdb)> select * from student;
OK
student.id student.name
4 Tonny
1 Alex
2 Amy
3 Mia
NULL NULL
Time taken: 1.647 seconds, Fetched: 5 row(s)
hive (mongdb)> select name, lower(name) as lower_name from student;
OK
name lower_name
Tonny tonny
Alex alex
Amy amy
Mia mia
NULL NULL
Time taken: 0.31 seconds, Fetched: 5 row(s)

-》压缩：

1》开启压缩

set hive.exec.compress.intermediate;

set hive.exec.compress.intermediate = true;

2>map开启

hive (default)>set hive.exec.compress.intermediate；
hive.exec.compress.intermediate=false
hive (default)> set hive.exec.compress.intermediate=true;
hive (default)> set mapreduce.map.output.compress;
mapreduce.map.output.compress=false
hive (default)> set mapreduce.map.output.compress=true;

3》reduce开启

开启最终输出压缩功能

set hive.exec.conpress.output=true

开启最终数据压缩功能

mapreduce.output.fileoutputformat.compress=true;

设置压缩方式：

set mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.SnappyCodec;

设置块压缩：

set mapreduce.output.fileoutputformat.compress.type=BLOCK;

-》设置存储格式：

创建表的时候加入

create table emp_num（time int，host string）

row format

delimited fields

terminated by ‘\t’

stored as orc; //指定存储格式

TextFile /SequenceFile/orc/Parquet

orc：Index Data/row Data /stripe Footer

压缩比：

orc》parquet》textfile

orc> testfile (50s > 54s)

-》数据倾斜优化：

set hive.map.aggr;
hive.map.aggr=true

a）设置负载均衡：

set hive.groupby.skewindata;
hive.groupby.skewindata=false

b)合并小文件

set hive.input.format;
hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat

JVM重用：

mapred-site.xml

mapreduce.job.jvm.numtasks=10~20