hive之分区表，

和通数据库htsjk.Com2019-11-04 23:02 来源:未知阅读:15914 评论 122 热度5

标签：

hive之分区表，

hive表之分区表
1、partition是数据库的partition列的密集索引
2、Hive表中一个partition对应于表名目录的一个子目录，所有的partition数据都存储在对应的子目录中
举例
hive>select * from sample_date;
1 Tom M 69 68 90
2 Marry F 90 89 78
3 Jerry M 69 93 70

进行一次查询，需要进行全表扫描
分区表建立时是需要指明建立的条件，我们根据性别进行分区
hive>create table partition_table(sid int,sname string) partition by (gender string) row format delimited terminated by ',';
hive>insert into table partition_table partition(gender='M') select * from sample_data where gender='M';
报错，列的数目不匹配
hive>insert into table partition_table partition(gender='M') select sid,sname from sample_data where gender='M';
在hdfs文件系统上partition_table目录下，有两个子目录gender=F 和 gender=M,在各子目录下有数据文件

3、进行分区，可以降低查询时的扫描记录，提高扫描效率，可以通过explain语句可以观察到sql执行过程，经过优化器后较好
hive>explain select * from sample_data where gender='M';
下面是查看分区表
hive>explain select * from partition_table where gender='M';

读取执行计划的方法是从下往下，从右向左