Hive函数，

和通数据库htsjk.Com2019-12-06 22:49 来源:未知阅读:8661 评论 320 热度3

标签：

Hive函数，

Hive函数

数学函数
聚合函数

说明

表生成函数
内置函数
一些函数的使用

lateral view
explode
parse_url_tuple
array_contains

数学函数

函数	说明
ceil	向上取整
rand(123)	带seed的，在[0,1]之间的随机数
exp(d)	e的d次幂
ln(d)	log(d)–自然数e为底d的对数
llog10(d) / log2(d) / log(m, d)	10为底 / 2为底 / m为底
power(d, p)	d的p次幂
pmod(m, n)	m对n取模，int型
sin(d);cos(d);tan(d)
asin(d);acos(d);atan(d)
degrees(d)	将弧度值d转换为角度值
radians(d)	与上反向操作
e(); pi()	数学常数e 和 pi

聚合函数

函数	说明
var_pop(col)	返回集合col中的一组数值的方差
var_samp(col)	返回集合col中的一组数值的样本方差
stddev_pop/samp	标准差
covar_pop/samp	协方差
corr(col1, col2)	相关系数
collect_set(col)	列去重，类似distinct
percentile(col,array(0.3, 0.5, 0.8))	返回 col 对应的3分位数、5分位、8分位数，其中，col的类型必须为BIGINT，可以用explode将结果转换为行
percentile_approx(col, array(0.3, 0.5, 0.8))	同上，其中col 类型必须为double型

说明

set hive.map.aggr = true 来提高聚合的性能
不允许在一个查询语句中使用多于一个的函数

表生成函数

函数	说明
explode(arr)	返回0到多行，arr里面每个元素一行
explode(map)	返回0到多行，map里面的每个kv一行
explode(ARRAY a)	与explode(arr)反向操作
json_tuple(jsonstr, p1, p2, …,pn)	输入为json字符串，输出为多个键(pi)对应的值
parse_url_tuple(url, p1, p2, …,pn)	输入url，输出为url各个部分，其中部分名称对大小写敏感，且不包含空格：HOST,PATH,QUERY,REF,PROTOCOL,AUTHORITY,FILE,USERINFO,QUERY:<KEY_NAME>

内置函数

函数	说明
concat(s1, s2)	字符串拼接
concat_ws(seperator, s1, s2)	字符串按指定的分隔符拼接
find_in_set(s,strlist)	返回 s 在 strlist 中第一次出现的位置，strlist是按逗号切割的字符串，若无，返回0
format_number(x, d)	将数值x转换为 xx,xxx.xx 形式的字符串，d为保留小数位，d = 0，则无小数位输出
get_json_object(json_str, key)
in_file(s, filename)	若文件名为filename的文件中，有完整一行数据与字符串s完全匹配，返回true
instr(str, s)	返回s在 str中的位置（从1开始）
locate(s, str, p)	查找在字符串 str 的 p位置后的字符串s首次出现的位置
lower(s) ; lcase(s)	字符串转为小写
lpad(s, len, pad)	从左侧开始对字符串 s 使用字符串 pad 进行填充，最终达到 len 的长度为止
ltrim(s)	去掉字符串 s 左侧的空格
regexp_extract(subject, regex_attern, index)	抽取字符串 subject 中符合正则表达式 regex_attern 的第 index 个部分的子字符串
regexp_replace(s, regex, replacement)	按照java正则 regex 将字符串 s 中符合条件的部分替换为 replacement 所指定的字符串
repeat(s, n)	重复输出 n 次字符串 s
reverse(s)	反转字符串 s
size(map) ; size(array)	返回map array 的元素个数
space(n)	返回n个空格
split(s, pattern)	按照正则 pattern 来分割字符串 s
str_to_map(s, delim1, delim2)	将字符串 s 按照指定分隔符转换为map，s：输入字符串；delim1：键值对之间的分隔符；delim2：键值之间的分隔符
from_unixtime(unixtime [, format])	将时间戳秒转换为UTC时间，并用字符串表示，可以通过format指定输出格式
unix_timestamp()	获取当前本地时区下的当前时间戳
unix_timestamp(date)	date 必须为 ‘yyyy-MM-dd HH:mm:ss’，如果不符合，则返回0，否则返回unix时间戳
unix_timestamp(date，pattern)	date ：符合pattern格式，返回unix时间戳
to_date(timestamp)	返回时间字符串的日期部分 yyyy-MM-dd
year(date)；month(date)；day(date)；hour；minute；second	返回时间字符串中相应的年月日，int型
weekofyear(date)	返回时间字符位于一年中的第几周
datediff(enddate, startdate)	返回 startdate 到 enddate 相差的自然日天数
date_add(startdate, days)	增加days天后的日期
date_sub(startdate, days)	减去 days天后的日期
from_utc_timestamp(timestamp, timezone)	将给定的时间戳timestamp(并非一定utc) 转化为指定时区(timezone)下的时间戳
to_utc_timestamp(timestamp, timezone)	与上相反

一些函数的使用

lateral view

query

SELECT myCol1, myCol2 FROM baseTable
LATERAL VIEW explode(col1) myTable1 AS myCol1
LATERAL VIEW explode(col2) myTable2 AS myCol2;

总结
- Lateral View通常和UDTF一起出现，为了解决UDTF不允许在select字段的问题
- Multiple Lateral View可以实现类似笛卡尔乘积。
- Outer关键字可以把不输出的UDTF的空结果，输出成NULL，防止丢失数据。

explode

注意事项
- No other expressions are allowed in SELECT ： SELECT pageid, explode(adid_list) AS myCol… is not supported（不能udtf和其他非udtf列混用）
- UDTF’s can’t be nested ： SELECT explode(explode(adid_list)) AS myCol… is not supported（不能嵌套）
- GROUP BY / CLUSTER BY / DISTRIBUTE BY / SORT BY is not supported
- SELECT explode(adid_list) AS myCol … GROUP BY myCol is not supported（select中的udtf一定要有别名，否则报错）
query

SELECT explode(myCol) AS myNewCol FROM myTable;
SELECT explode(myMap) AS (myMapKey, myMapValue) FROM myMapTable;
SELECT posexplode(myCol) AS pos, myNewCol FROM myTable;

parse_url_tuple

create table

create external table if not exists t_url(f1 string, f2 string) row format delimited fields TERMINATED BY ' ' location '/test/url';

query

SELECT f1, b.* FROM t_url LATERAL VIEW parse_url_tuple(f2, 'HOST', 'PATH', 'QUERY', 'QUERY:k1') b as host, path, query, query_id

array_contains

create table

create EXTERNAL table IF NOT EXISTS userInfo (id int,sex string, age int, name string, email string,sd string, ed string)  ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' location '/hive/dw';

query

select * from userinfo where sex='male' and (id!=1 and id !=2 and id!=3 and id!=4 and id!=5) and age < 30;

select * from (select * from userinfo where sex='male' and !array_contains(split('1,2,3,4,5',','),cast(id as string))) tb1 where tb1.age < 30;

来源声明：以上内容来源于《Hive编程指南》及个人学习笔记与总结，如有侵权，请告知