hive语句，

和通数据库htsjk.Com2020-02-03 22:09 来源:未知阅读:8203 评论 439 热度5

标签：

hive语句，

hive 库和表操作

1、库操作

1.1.1、创建库
语法结构：
CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name
[COMMENT database_comment]
[LOCATION hdfs_path]
[WITH DBPROPERTIES (property_name=property_value, …)];
创建库的使用方式：
1、创建普通库
create database dbname;
2、创建库的时候检查存与否
create databse if not exists dbname;
3、创建库的时候带注释
create database if not exists dbname comment ‘create my db named dbname’;
4、创建带属性的库
create database if not exists dbname with dbproperties (‘a’=‘aaa’,‘b’=‘bbb’);
create database if not exists myhive with dbproperties (‘a’=‘aaa’,‘b’=‘bbb’);
1.1.2、查看库
1、查看有哪些数据库
show databases;
2、显示数据库的详细属性信息
语法：desc database [extended] dbname;
示例：desc database extended myhive;
3、查看正在使用哪个库
select current_database();
4、查看创建库的详细语句
show create database mydb;
1.1.3、删除库
删除库操作：
drop database dbname;
drop database if exists dbname;
默认情况下，hive 不允许删除包含表的数据库，有两种解决办法：
1、手动删除库下所有表，然后删除库
2、使用 cascade 关键字
drop database if exists dbname cascade;
默认情况下就是 restrict
drop database if exists myhive ==== drop database if exists myhive restrict
1.1.4、切换库
切换库操作：
语法：use database_name
实例：use myhive;

表操作

a、创建内部表
create table mytable (id int, name string)
row format delimited fields terminated by ‘,’ stored as textfile;
b、创建外部表
create external table mytable2 (id int, name string) row format delimited fields
terminated by ‘,’ location ‘/user/hive/warehouse/mytable2’;
e: 删除表
命令：drop table if exists mytable;
f:清空表
truncate table student;
truncate table student_ptn partition(city=’beijing’);
c、分区表操作
create table mytable3(id int, name string) partitioned by(sex string)
row format delimited fields terminated by ‘,’ stored as textfile;
查看表的分区：show partitions mytable3
插入数据到分区
插入男分区数据：load data local inpath ‘/root/hivedata/mingxing.txt’ overwrite into
table mytable3 partition(sex=‘boy’);
分区数据查询：通过where进行分区的过滤分区字段=分区名
select * from mytable3 where sex='boy’
// 删除分区示例
ALTER TABLE student_p DROP if exists partition(part=‘aa’);
d、创建分桶表
create table stu_buck(Sno int,Sname string,Sex string,Sage int,Sdept string)
clustered by(Sno) sorted by(Sno DESC) into 4 buckets
row format delimited fields terminated by ‘,’;
e · 数据插入
1、插入一条数据：
INSERT INTO TABLE table_name VALUES(XX,YY,ZZ);
2、利用查询语句将结果导入新表：
OVERWRITE关键字会对表内容进行覆盖
INSERT OVERWRITE [INTO] TABLE table_name [PARTITION (partcol1=val1, partcol2=val2 …)]
select_statement1 FROM from_statement
3、多重插入
FROM from_statement
INSERT OVERWRITE TABLE table_name1 [PARTITION (partcol1=val1, partcol2=val2 …)]
Stay hungry Stay foolish – http://blog.csdn.net/zhongqi2513
select_statement1
INSERT OVERWRITE TABLE table_name2 [PARTITION (partcol1=val1, partcol2=val2 …)]
select_statement2] …
f:数据导出
1、导出数据到本地：
insert overwrite local directory ‘/home/hadoop/student.txt’ select * from studentss;
注意：数据写入到文件系统时进行文本序列化，且每列用^A 来区分，\n 为换行符。用
more 命令查看时不容易看出分割符，可以使用: sed -e ‘s/\x01/\t/g’ filename 来查看。
2、导出数据到 HDFS
insert overwrite directory ‘/student’ select * from studentss where age >= 20;
g、 Join 关联表
a)、inner join（内连接）（把符合两边连接条件的数据查询出来）
select * from tablea a inner join tableb b on a.id=b.id;
b)、left join（左连接，等同于 left outer join）
1、以左表数据为匹配标准，左大右小
2、匹配不上的就是 null
3、返回的数据条数与左表相同
HQL 语句：select * from tablea a left join tableb b on a.id=b.id;
c)、right join（右连接，等同于 right outer join）
1、以右表数据为匹配标准，左小右大
2、匹配不上的就是 null
3、返回的数据条数与右表相同
HQL 语句：select * from tablea a right join tableb b on a.id=b.id;
e)、left semi join（左半连接）（因为 hive 不支持 in/exists 操作（1.2.1 版本的 hive 支持
in 的操作），所以用该操作实现，并且是 in/exists 的高效实现）
select * from tablea a left semi join tableb b on a.id=b.id;
Stay hungry Stay foolish – http://blog.csdn.net/zhongqi2513
f)、full outer join（完全外链接）
select * from tablea a full outer join tableb b on a.id=b.id;
hive的内置函数