欢迎投稿

今日深度:

hive索引,

hive索引,



创建

hive (zmgdb)> create index index_t1 on table v_t1(name)
            > as
            > 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
            > with
            > deferred rebuild in table save_index_t1_table;

OK
Time taken: 0.524 seconds

save_index_t1_table:保存索引的表。

即创建了的索引,需要一张表去保存,一个索引一张索引保存表,保存在hadoop里。
as 指定索引器,org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler 是固定值,常用的索引器。

重建索引,新增数据要重建索引,这样在保存索引的 t1_index_table 就有索引信息了。

hive (zmgdb)> alter index index_t1 on v_t1 rebuild;

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20160923005139_9caf10f1-5481-4de8-b95a-889c19e45032
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1474540738385_0003, Tracking URL = http://hello110:8088/proxy/application_1474540738385_0003/
Kill Command = /home/hadoop/app/hadoop-2.7.2/bin/hadoop job  -kill job_1474540738385_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2016-09-23 00:51:46,046 Stage-1 map = 0%,  reduce = 0%
2016-09-23 00:51:54,485 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.91 sec
2016-09-23 00:52:00,724 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.76 sec
MapReduce Total cumulative CPU time: 4 seconds 760 msec
Ended Job = job_1474540738385_0003
Loading data to table zmgdb.save_index_t1_table
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 4.76 sec   HDFS Read: 9845 HDFS Write: 426 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 760 msec
OK
Time taken: 22.73 seconds


索引表分析

hive (zmgdb)> select * from save_index_t1_table;
OK
save_index_t1_table.name        save_index_t1_table._bucketname save_index_t1_table._offsets
lisi    hdfs://hello110:9000/user/hive/warehouse/zmgdb.db/v_t1/v_t1     [0]
xiaohua hdfs://hello110:9000/user/hive/warehouse/zmgdb.db/v_t1/v_t1     [49]
xiaoji  hdfs://hello110:9000/user/hive/warehouse/zmgdb.db/v_t1/v_t1     [32]
ximing  hdfs://hello110:9000/user/hive/warehouse/zmgdb.db/v_t1/v_t1     [15]
xx      hdfs://hello110:9000/user/hive/warehouse/zmgdb.db/v_t1/v_t1     [67]
Time taken: 0.073 seconds, Fetched: 5 row(s)

索引里面保存了:索引键内容,内容所在文件位置,内容在文件里的偏移量。 hive select 会去找索引,例如name=lisi的值,找到该值所在的文件位置,和在文件里的偏移量,进入该文件到指定的偏移量里,找出来的就是了。 如果没有索引,会开启mr去目录下全局查找,有了索引,就像书有了目录,不用整本书找了,通过目录找,肯定更快。简单的select 查询hive不启用mapreduce,复杂的会启动。



显示表表的索引

show formatted index on t1;


删除索引

drop index  if exists t1_index on t1;


补充:

表的数据发生改变后,都要重建表的索引。





www.htsjk.Com true http://www.htsjk.com/hive/34355.html NewsArticle hive索引, 创建 hive (zmgdb) create index index_t1 on table v_t1(name)             as             'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'             with             deferred rebuild in ta...
相关文章
    暂无相关文章
评论暂时关闭