Hive 分区,
分区表:也是对应于HDFS上的文件夹
1)静态分区
单级分区:CREATE TABLE ruoze_order_partition (
order_number string,
event_time string
) PARTITIONED BY (event_month string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t";
多级分区:CREATE TABLE ruoze_order_mulit_partition (
order_number string,
event_time string
) PARTITIONED BY (event_month string, step string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t";
① LOAD DATA LOCAL INPATH "/home/hadoop/data/order_created.txt"
OVERWRITE INTO TABLE ruoze_order_partition
PARTITION (event_month='2014-05');
② INSERT OVERWRITE TABLE ruoze_order_partition PARTITION(event_month='2014-07')
select * from ruoze_order_partition_basic;
③ LOAD DATA LOCAL INPATH "/home/hadoop/data/order_created.txt"
OVERWRITE INTO TABLE ruoze_order_mulit_partition
PARTITION (event_month='2014-08', step='1'); //一定要与创建时的PARTITIONED BY对应上
刷新元数据:
① MSCK REPAIR TABLE ruoze_order_partition; //该方式刷新整个表,太暴力,生产不可取
② ALTER TABLE table_name ADD PARTITION (event_month='2014-06'); //生产常用
说明:
1. 分区列是伪列,不是真实存在的。
2. 手工在HDFS增加分区,放置数据,在hive是查询不出来的,需要刷新元数据
2)动态分区
动态分区:CREATE TABLE ruoze_emp_dynamic_partition (
empno int,
ename string,
job string,
mgr int,
hiredate string,
salary double,
comm double
) PARTITIONED BY (deptno int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t";
加载: ① insert into table ruoze_emp_partition partition(deptno=10)
select empno,ename ,job ,mgr ,hiredate ,salary ,comm from emp where deptno=10;
② insert into table ruoze_emp_partition partition(deptno=20)
select empno,ename ,job ,mgr ,hiredate ,salary ,comm from emp where deptno=20;
③ insert into table ruoze_emp_partition partition(deptno=30)
select empno,ename ,job ,mgr ,hiredate ,salary ,comm from emp where deptno=30;
动态加载:
① insert into table ruoze_emp_dynamic_partition partition(deptno)
select empno, ename , job, mgr, hiredate, salary, comm, deptno from emp; //最后一定是分区字段
set hive.exec.dynamic.partition.mode=nonstrict;