hive索引，

和通数据库htsjk.Com2019-09-07 23:27 来源:未知阅读:11192 评论 147 热度2

标签：

hive索引，

创建

hive (zmgdb)> create index index_t1 on table v_t1(name)
> as
> 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
> with
> deferred rebuild in table save_index_t1_table;
OK
Time taken: 0.524 seconds

save_index_t1_table：保存索引的表。

即创建了的索引，需要一张表去保存，一个索引一张索引保存表，保存在hadoop里。
as 指定索引器，org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler 是固定值，常用的索引器。

重建索引，新增数据要重建索引，这样在保存索引的 t1_index_table 就有索引信息了。

hive (zmgdb)> alter index index_t1 on v_t1 rebuild;

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20160923005139_9caf10f1-5481-4de8-b95a-889c19e45032
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1474540738385_0003, Tracking URL = http://hello110:8088/proxy/application_1474540738385_0003/
Kill Command = /home/hadoop/app/hadoop-2.7.2/bin/hadoop job -kill job_1474540738385_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2016-09-23 00:51:46,046 Stage-1 map = 0%, reduce = 0%
2016-09-23 00:51:54,485 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.91 sec
2016-09-23 00:52:00,724 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.76 sec
MapReduce Total cumulative CPU time: 4 seconds 760 msec
Ended Job = job_1474540738385_0003
Loading data to table zmgdb.save_index_t1_table
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.76 sec HDFS Read: 9845 HDFS Write: 426 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 760 msec
OK
Time taken: 22.73 seconds

索引表分析

hive (zmgdb)> select * from save_index_t1_table;
OK
save_index_t1_table.name save_index_t1_table._bucketname save_index_t1_table._offsets
lisi hdfs://hello110:9000/user/hive/warehouse/zmgdb.db/v_t1/v_t1 [0]
xiaohua hdfs://hello110:9000/user/hive/warehouse/zmgdb.db/v_t1/v_t1 [49]
xiaoji hdfs://hello110:9000/user/hive/warehouse/zmgdb.db/v_t1/v_t1 [32]
ximing hdfs://hello110:9000/user/hive/warehouse/zmgdb.db/v_t1/v_t1 [15]
xx hdfs://hello110:9000/user/hive/warehouse/zmgdb.db/v_t1/v_t1 [67]
Time taken: 0.073 seconds, Fetched: 5 row(s)

索引里面保存了：索引键内容，内容所在文件位置，内容在文件里的偏移量。 hive select 会去找索引，例如name=lisi的值，找到该值所在的文件位置，和在文件里的偏移量，进入该文件到指定的偏移量里，找出来的就是了。如果没有索引，会开启mr去目录下全局查找，有了索引，就像书有了目录，不用整本书找了，通过目录找，肯定更快。简单的select 查询hive不启用mapreduce，复杂的会启动。

显示表表的索引

show formatted index on t1;

删除索引

drop index if exists t1_index on t1;

补充：

表的数据发生改变后，都要重建表的索引。