欢迎投稿

今日深度:

Hive 分区,

Hive 分区,


分区表:也是对应于HDFS上的文件夹

1)静态分区

    单级分区:CREATE TABLE ruoze_order_partition (
                        order_number string,
                        event_time string
                     ) PARTITIONED BY (event_month string)

                       ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t";

    多级分区:CREATE TABLE ruoze_order_mulit_partition (
                        order_number string,
                        event_time string
                    ) PARTITIONED BY (event_month string, step string)
                      ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t";

    加载:

        ① LOAD DATA LOCAL INPATH "/home/hadoop/data/order_created.txt" 
             OVERWRITE INTO TABLE ruoze_order_partition

             PARTITION (event_month='2014-05');

        ② INSERT OVERWRITE TABLE ruoze_order_partition PARTITION(event_month='2014-07')

             select * from ruoze_order_partition_basic;       

        ③ LOAD DATA LOCAL INPATH "/home/hadoop/data/order_created.txt" 
             OVERWRITE INTO TABLE ruoze_order_mulit_partition
             PARTITION (event_month='2014-08', step='1');    //一定要与创建时的PARTITIONED BY对应上

    刷新元数据:
        ① MSCK REPAIR TABLE ruoze_order_partition;    //该方式刷新整个表,太暴力,生产不可取
        ② ALTER TABLE table_name ADD PARTITION (event_month='2014-06');    //生产常用

    说明:

        1. 分区列是伪列,不是真实存在的。
        2. 手工在HDFS增加分区,放置数据,在hive是查询不出来的,需要刷新元数据  


2)动态分区

    动态分区:CREATE TABLE ruoze_emp_dynamic_partition (
                        empno int,
                        ename string,
                        job string,
                        mgr int,
                        hiredate string,
                        salary double,
                        comm double
                    ) PARTITIONED BY (deptno int)

                      ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t";

    加载:

        ① insert into table ruoze_emp_partition partition(deptno=10)
                select empno,ename ,job ,mgr ,hiredate ,salary ,comm from emp where deptno=10;
        ② insert into table ruoze_emp_partition partition(deptno=20)
                select empno,ename ,job ,mgr ,hiredate ,salary ,comm from emp where deptno=20;
        ③ insert into table ruoze_emp_partition partition(deptno=30)

                select empno,ename ,job ,mgr ,hiredate ,salary ,comm from emp where deptno=30;

    动态加载:       
        ① insert into table ruoze_emp_dynamic_partition partition(deptno)
                select empno, ename , job, mgr, hiredate, salary, comm, deptno from emp;    //最后一定是分区字段

    关闭静态分区:       
        set hive.exec.dynamic.partition.mode=nonstrict;

www.htsjk.Com true http://www.htsjk.com/hive/40575.html NewsArticle Hive 分区, 分区表:也是对应于HDFS上的文件夹 1)静态分区     单级分区:CREATE TABLE ruoze_order_partition (                         order_number string,                   ...
相关文章
    暂无相关文章
评论暂时关闭