hive 备忘录，

和通数据库htsjk.Com2019-09-25 22:53 来源:未知阅读:2491 评论 363 热度5

标签：

hive 备忘录，

1 hive结果用gzip压缩输出

在运行查询命令之前，设置下面参数：

set mapred.output.compress=true; set hive.exec.compress.output=true; set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec; set io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec; INSERT OVERWRITE DIRECTORY 'hive_out' select * from tables limit 10000;
2 应用cloudera 的cdh3进行 hive left outer join，并且两个表都有分区的时候：方法一：用子查询
方法二：select a.*,b.* from table a left outer join table b on(a.uid=b.uuid and b.dt='2011-08-21') where a.dt='2011-08-21'；
3 hive写sql的时候注意数据类型：当uid是string的时候
select count(distinct uid) from table where dt = '2011-08-28' and type=2 and loginflag='3' and (uid<'23000000' or (uid>'50000000' and uid<'1500000000')) select count(distinct uid) from newbehavior_table where dt='2011-08-28' and type=2 and (uid<23000000 or (uid<1500000000 and uid>50000000)) and loginflag='3'; 两个sql的结果是不一样的。。。。。

4 在hive建立一个存储apache 日志的表

add jar ../build/contrib/hive_contrib.jar; CREATE TABLE apachelog ( host STRING, identity STRING, user STRING, time STRING, request STRING, status STRING, size STRING, referer STRING, agent STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\"[^\"]*\") ([^ \"]*|\"[^\"]*\"))?", "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s" ) STORED AS TEXTFILE;

hive相关函数：http://www.karmasphere.net/Karmasphere-Analyst/hive-user-defined-functions.html