Hive，

和通数据库htsjk.Com2019-07-21 22:48 来源:未知阅读:16529 评论 467 热度3

标签：

Hive，

1 基于Hadoop的一个离线分析

2 类sql语言--HQL

3.降低Hadoop中MapReduce的开发

4 在底层将输入的HQL转化MapReduce 运行效率慢

5 一个操作数据仓库工具的工具

6 对结构化数据来进行分析

7 表，以及表中的字段是hive的元数据，hive处理的数据放在HDFS

8 HIve的元数据默认在Hive自带

基础指令：

1 show databases;

2 create database ** ;

3 user ** ; show tables ; create table ** (id int ,name string);

4 insert into **values(1,'liu');

指令

5 load data local inpath '/home/s/s.txt' into table stu ; 通过加载文件数据到指定的表里

6 create table stu1(id int,name string) row format delimited fields terminated by ' ' 创建stu1表，并指定分割符空格

7 insert overwrite local directory '/home/stu' row format delimited fields terminated by ' ' select * from stu;

将stu表中查询的数据写到本地的/home/stu目录下

8 insert overwrite directory '/stu' row format delimited fields terminated by ' ' select * from stu;

将stu表中查询的数据写到HDFS的stu目录下

9 创建外部表

进入hive，执行：create external table stu (id int,name string) row format delimited fields terminated by ' ' location '/目录路径'

元数据

1 表名、表里有哪些字段，字段类型、哪张表存在哪个数据下等这些表信息，称之为hive的元数据

2 默认情况下，hive的元数据信息不是存在hdfs上的，而是存在hive自带的derby关系型数据库里的

hive的数据类型：

1 array

建表语句：

create external table ex(vals array<int>) row format delimited fields terminated by '\t' collection items terminated by ',' location '/ex';

2 map 注意：map类型，列的分割符必须是\t

建表语句：

create external table ex (vals map<string,string>) row format delimited fields terminated by '\t' map keys terminated by ' ' location '/ex';

Hive 中数据倾斜的原因：

1 group by

2 distinct count(distinct xx)

3 join

二，处理group by的数据倾斜问题

调优参数:set hive.groupby.skewindata=true; 数据倾斜时负载均衡

Hive的优化：主要HQ的优化

1 map side join mapJoin的主要意思就是，当连接的两个表是一个比较小的表和一个特别大的表的时候

在hive做join时，要求小表在前(左）

2 join语句优化子查询可以放在前面(左)join

3 group by 优化出现group by 过程出现倾斜，需要将hive.groupby.skewindata设置为true

4 count distinct 优化

优化前：select count(distinct id )from tablename

优化后：select count(*) from (select distinct id from tablename)tmp;