hive 分区，

和通数据库htsjk.Com2019-12-25 23:36 来源:未知阅读:5076 评论 267 热度2

标签：

hive 分区，

create table day_table (id int, content string) partitioned by (dt string）；

schematool -dbType MySQL -initSchema

动态分区

set hive.exec.dynamic.partition=true;

set hive.exec.dynamic.partition.mode=nonstrict;

INSERT OVERWRITE TABLE target PARTITION (dt)
SELECT id,user_id,app_id,time,ip,substr(time,0,10) FROM origin

可以将老数据load 到hive表中后，然后在用insert select 方式给老数据创建分区。

在处理好老数据分区后，用静态分区：

create table partuserview(userid string,hourid string,cityid string,fronturl string,contentype int) partitioned by (dt string) row format delimited fields terminated by '\t';

load data local inpath '/home/xincl/hive-0.6.0/data/userview.txt' into table partuserview partition(dt='20101110');

创建bucket表语句：
> create table butuserview(userid string,hourid string,cityid string,fronturl string,contentype int) clustered by (hourid) into 12 buckets row format delimited fields terminated by '\t';
上面指定使用hourid来做bucket，有12个bucket。用hourid的hashcode MOD bucket个数。

sqoop 引用分区：

sqoop import --connect jdbc:postgresql://ip/db_name --username user_name --table table_name --hive-import -m 5 --hive-table hive_table_name (--hive-partition-key partition_name --hive-partition-value partititon_value);

修改字段类型：

ALTER TABLE MUSER_BASICINFO_CPA CHANGE USERINDEX USERINDEX bigint;