hive查询hbase
查询的确是不太方便,除了指定,或者通过指定 进行之外,没有更有效的查询方式
如果想通过列值过滤,只能全表扫描了 如果要搞什么或者(除非你的做了相应设计) 更是没法弄 在传统的得心应手的查询在上就是束手束脚
解决问题,但为了查询去写,代价未免有点高 于是出现了
- ,这个会直接操作HBase,可能会对线上产生影响
将HBase表导入到HDFS上,比如
-
,
cid ,
content ,
ctime ,
gmt_create ,
hostName ,
item ,
mtime ,
otags ,
priority ,
retry ,
result ,
srcImages ,
src_url ,
status ,
summary ,
task_type ,
title ,
userId ,
userNick ,
utags ,
writer
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY
LOCATION ;
-
cid,result task_history limit ;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is since there
-
-
task_history;
-
,
cid ,
content ,
ctime ,
gmt_create ,
hostName ,
item ,
mtime ,
otags ,
priority ,
retry ,
result ,
srcImages ,
src_url ,
status ,
summary ,
task_type ,
title ,
userId ,
userNick ,
utags ,
writer
)
PARTITIONED BY (dt )
ROW FORMAT DELIMITED FIELDS TERMINATED BY
LOCATION ;
-
task_history PARTITION(dt=) LOCATION ;
-
或者其他导出工具将HBase表导出到HDFS,正如前面提到的每天一个目录(以日期命名)
-
`) LOCATION '/group/wireless-arctic/task/`date -d yesterday +`';"
-
* task_history dt= limit ;
-
,
cid ,
content ,
ctime ,
gmt_create ,
hostName ,
item ,
mtime ,
otags ,
priority ,
retry ,
result ,
srcImages ,
src_url ,
status ,
summary ,
task_type ,
title ,
userId ,
userNick ,
utags ,
writer
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY
LOCATION ;
cid,result task_history limit ;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is since there
-
task_history; -
, cid , content , ctime , gmt_create , hostName , item , mtime , otags , priority , retry , result , srcImages , src_url , status , summary , task_type , title , userId , userNick , utags , writer ) PARTITIONED BY (dt ) ROW FORMAT DELIMITED FIELDS TERMINATED BY LOCATION ; -
task_history PARTITION(dt=) LOCATION ; -
或者其他导出工具将HBase表导出到HDFS,正如前面提到的每天一个目录(以日期命名)
-
`) LOCATION '/group/wireless-arctic/task/`date -d yesterday +`';" -
* task_history dt= limit ;
本站文章为和通数据库网友分享或者投稿,欢迎任何形式的转载,但请务必注明出处.
同时文章内容如有侵犯了您的权益,请联系QQ:970679559,我们会在尽快处理。