hive 基本操作，

和通数据库htsjk.Com2019-11-01 23:10 来源:未知阅读:7771 评论 582 热度3

标签：

hive 基本操作，

Hive

hive 基本操作

建库建表
sql create database mydb;创建数据库 use mydb；建表 create table stu( id int, name string, sex string, address string, poneNum string )row format delimited fields terminated by '\t'; --如果不指定分隔符，那么就是默认的hive分隔符'\001' --指定分隔符需要按照数据的分隔符设置
like关键词 –复制原表的表结构
create table emp_like like emp;
as 关键词 –抽取表格中一部分的数据
create table dept_like as +sql语句
```
栗子：create table dept_like as  select * from dept limit 2;
```
导入数据
导出数据：

插入数据


    load data local inpath '/opt/hivedata/stu.txt' into table stu;
    执行多次代表了追加数据

    --》加local，代表加载本地数据到hive表中（linux的路径）
    --》不加local，代表加载是hdfs文件系统的路径

覆盖以前的数据
sql load data local inpath '/opt/hivedata/stu.txt' overwrite into table stu;
表中的数据是储存在hdfs上的
***创建数据库的时候，相应的在user/hive/warehouse下面会产生一个同名的目录 ***创建表的时候，会在相应的数据库下创建一个同名的子目录 ***加载数据的时候，会在对应表目录下产生一个同名文件
表的类型

查看表的详细信息
    desc 表名；
    desc formatted 表名；

创建外部表：
create EXTERNAL table emp_ext(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
)row format delimited fields terminated by '\t';
加载数据
load data  local inpath '/opt/hivedata/dept.txt' into table emp_ext;

管理表和外部表的区别

管理表删除之后，元数据和数据全部删除
外部表删除之后，元数据删除，数据仍然存在

一般都使用外部表，为了保证数据的安全性。

hive运行的重要参数

hive 分区

分区表

用来分析用户的行为日志，在访问网站的时候，点击网站的时候就会产生一条日志

20171014
    20171015
    20171016  按照每个小时的
        2017101610.log
        2017101611.log
        2017101612.log
        2017101613.log
            2017101613...

分区表语法

静态分区(一级分区)
- ```
create table emp_part(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
)partitioned by(date string)
row format delimited fields terminated by '\t';
```
  - 插入数据：
    sql 语法： load data [local] inpath '路径' [overwrite] into table 表名 partititon (date='20171016') load data local inpath '/opt/hivedata/emp.txt' into table emp_part partition (date='20171016');
  - 注意：
    - 建立分区表的时候，只需要在一般性建表的后面加上PARTITIONED BY (PARTITION_CLOUMN string………)
    - 在导入数据的时候（插入数据），要注意一定要指定分区范围，否则就会报错
    - 在储存的时候，在相应的HDFS上面，在表的目录下面，相对应的创建多级目录

动态分区(二级分区)

sql create table emp_part2( empno int, ename string, job string, mgr int, hiredate string, sal double, comm double, deptno int )partitioned by(date string,hour string) row format delimited fields terminated by '\t';
插入数据：
sql 语法： load data [local] inpath '路径' [overwrite] into table 表名 partititon (date='20171016',hour='18') load data local inpath '/opt/hivedata/emp.txt' into table emp_part2 partition (date='20171016',hour='18'); load data local inpath '/opt/hivedata/emp.txt' into table emp_part2 partition (date='20171016',hour='19'); load data local inpath '/opt/hivedata/emp.txt' into table emp_part2 partition (date='20171016',hour='20');
查询数据：
sql select ename from emp_part where date='20171016'; select ename from emp_part2 where date='20171016' and hour='19'; 分表的查询，只需要在语句后面加上分区条件

分区表的示例(二级分区)

表：track_log
数据：2015082818，2015082819
create table track_log(
id                                 string,
url                                string,
referer                            string,
keyword                            string,
type                               string,
guid                               string,
pageId                             string,
moduleId                           string,
linkId                             string,
attachedInfo                       string,
sessionId                          string,
trackerU                           string,
trackerType                        string,
ip                                 string,
trackerSrc                         string,
cookie                             string,
orderCode                          string,
trackTime                          string,
endUserId                          string,
firstLink                          string,
sessionViewNo                      string,
productId                          string,
curMerchantId                      string,
provinceId                         string,
cityId                             string,
fee                                string,
edmActivity                        string,
edmEmail                           string,
edmJobId                           string,
ieVersion                          string,
platform                           string,
internalKeyword                    string,
resultSum                          string,
currentPage                        string,
linkPosition                       string,
buttonPosition                     string
)PARTITIONED BY (date string,hour string)
row format delimited fields terminated by '\t';

加载数据：

        load data local inpath '/opt/hivedata/2015082818' into table track_log partition (date='20150828',hour='18');
        load data local inpath '/opt/hivedata/2015082819' into table track_log partition (date='20150828',hour='19');

注意：

动态分区：
    动态分区前需要设置
    set hive.exec.dynamic.partition=true;  
    set hive.exec.dynamic.partition.mode=nonstrict;

LOL.LOG         2017101418.LOG
        2017101419.LOG

下载数据到本地
sql insert overwrite local directory '/opt/hivedata/LOL.log' row format delimited fields terminated by '\t' select * from track_log;

创建分区表

create table LOL_log like track_log;

insert  overwrite table LoL_log partition (date,hour) select * from track_log;

insert  overwrite table LoL_log partition (date,hour) select * from track_log where date='20150828' and hour='19';