欢迎投稿

今日深度:

hive 备忘录,

hive 备忘录,


1 hive结果用gzip压缩输出

    在运行查询命令之前,设置下面参数:

set mapred.output.compress=true; set hive.exec.compress.output=true; set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec; set io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec; INSERT OVERWRITE DIRECTORY 'hive_out' select * from tables limit 10000;
2 应用cloudera 的cdh3进行 hive left outer join,并且两个表都有分区的时候: 方法一:用子查询
方法二:select a.*,b.* from table a left outer join table b on(a.uid=b.uuid and b.dt='2011-08-21') where a.dt='2011-08-21';
3 hive写sql的时候注意数据类型: 当uid是string的时候
select count(distinct uid) from table where dt = '2011-08-28' and type=2 and loginflag='3' and (uid<'23000000' or (uid>'50000000' and uid<'1500000000')) select count(distinct uid) from newbehavior_table where dt='2011-08-28' and type=2 and (uid<23000000 or (uid<1500000000 and uid>50000000)) and loginflag='3'; 两个sql的结果是不一样的。。。。。

4 在hive建立一个存储apache 日志的表
add jar ../build/contrib/hive_contrib.jar;   CREATE TABLE apachelog (   host STRING,   identity STRING,   user STRING,   time STRING,   request STRING,   status STRING,   size STRING,   referer STRING,   agent STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES (   "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\"[^\"]*\") ([^ \"]*|\"[^\"]*\"))?",   "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s" ) STORED AS TEXTFILE;
 hive相关函数:http://www.karmasphere.net/Karmasphere-Analyst/hive-user-defined-functions.html

www.htsjk.Com true http://www.htsjk.com/hive/36329.html NewsArticle hive 备忘录, 1 hive结果用gzip压缩输出     在运行查询命令之前,设置下面参数: set mapred.output.compress=true; set hive.exec.compress.output=true; set mapred.output.compression.codec=org.apache.hadoop.io.compr...
相关文章
    暂无相关文章
评论暂时关闭