欢迎投稿

今日深度:

【Hive】Hive编程指南,

【Hive】Hive编程指南,


第一章 基础知识

Hive

MapReduce计算框架

第二章 基础操作

安装Hive

$ cd /your/hive/directory/
$ tar -xzf hive-0.9.0.tar.gz
$ sudo echo "export HIVE_HOME=$PWD/hive-0.9.0" > /etc/profile.d/hive.sh
$ sudo echo "PATH=$PATH:$HIVE_HOME/bin" > /etc/profile.d/hive.sh
$ . /etc/profile

配置Hive

启动Hive

$ cd $HIVE_HOME
$ bin/hive
...
hive> CREATE TABLE x (a INT);
OK
Time taken: ...
hive> SELECT *
    > FROM x;
OK
Time taken: ...
hive> DROP TABLE x;
OK
Time taken: ...
hive> exit;
$

命令行界面

第三章 数据类型和文件格式

集合数据类型

数据类型 字面语法示例 引用
STRUCT struct(‘Jone’, ‘Doe’) 字段名.first
MAP map(‘first’, ‘Jone’, ‘last’, ‘Doe’) 字段名[‘last’]
ARRAY Array(‘Jone’, ‘Doe’) 数组名[1]

文本文件数据编码

表结构声明(明确指定分隔符):

CREATE TABLE employees (
    name STRING,
    salary FLOAT,
    subordinates ARRAY<STRING>,
    deductions MAP<STRING, FLOAT>,
    address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
)
ROW FORMAT DELIMITED
FILEDS TERMINATED BY '\001'
COLLECTION ITEMS TERMINATED BY '\002'
MAP KEYS TERMINATED BY '\003'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;

第四章 HiveQL:数据定义

4.1 Hive中的数据库

创建数据库

hive> CREATE DATABASE financials
    > LOCATION 'my/prefered/directory';

hive> CREATE DATABASE financials
    > COMMENT 'Holds all financial tables';

hive> DESCRIBE DATABASE financials;
hive> CREATE DATABASE financials
    > WITH DBPROPERTIES ('creator' = 'me', 'date' = '2016-06-22');
hive> DESCRIBE DATABASE EXTENDED financials;

删除数据库

hive> DROP DATABASE IF EXISTS financials CASCADE;
[restrict]

4.2 修改数据库

hive> ALTER DATABASE financials
    > SET DBPROPERTIES ('edited-by' = 'dba');

4.3 创建表

创建

CREATE TABLE employees (
    name STRING COMMENT 'Employee name',
    salary FLOAT COMMENT 'Employee salary',
    subordinates ARRAY<STRING> COMMENT 'Names of subordinates',
    deductions MAP<STRING, FLOAT> COMMENT 'Keys are deductions names, values are percentages',
    address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT> COMMENT 'Home address'
)
COMMENT 'Description of the table'
TBLPROPERTIES ('creator' = 'me', 'created_at' = '2016-06-22', ...)
LOCATION 'user/hive/warehouse/mydb.db/employees';

描述

hive> DESCRIBE EXTENDED mydb.employees;

hive> DESCRIBE FORMATTED mydb.employees;

实际使用FORMATTED更多一些。

外部表

CREATED EXTERNAL TABLE ...

可以在DESCRIBE EXTENDED tablename语句的输出中看到以下信息:
... tableType:MANAGED_TABLE)
... tableType:EXTERNAL_TABLE)

分区表、管理表

创建分区表

CREATE TABLE employees (
    name STRING,
    salary FLOAT,
    subordinates ARRAY<STRING>,
    deductions MAP<STRING, FLOAT>,
    address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
)
PARTITIONED BY (country STRING, state STRING);

分区过滤器:

WHERE country = 'US' AND state = 'IL';
hive> SHOW PARTITIONS employees PARTITION(country='US');

自定义表的存储格式

4.6 修改表

重命名

ALTER TABLE log_messages RENAME TO logmsgs;

增加、修改和删除表分区

ALTER TABLE log_messages ADD IF NOT EXISTS
PARTITION (year=2000, month=1, day=1) LOCATION '/logs/2000/1/1'
PARTITION (year=2000, month=1, day=2) LOCATION '/logs/2000/1/2'
PARTITION (year=2000, month=1, day=3) LOCATION '/logs/2000/1/3'
ALTER TABLE log_messages PARTITION (year=2000,month=1,day=1)
SET LOCATION 's3n://outbucket/logs/2000/01/01'
ALTER TABLE log_messages DROP IF EXISTS PARTITION (year=2000,month=1,day=1)

修改列信息

ALTER TABLE log_messages
CHANGE COLUMN hms hours_minutes_seconds INT
COMMENT 'The hours, minutes, and seconds part of the timestamp'
AFTER severity;

增加列

ALTER TABLE log_messages ADD COLUMNS (
    app_name STRING COMMENT 'Application name',
    session_id LONG COMMENT 'The current session id'
);

删除或者替换列

ALTER TABLE log_messages REPLACE COLUMNS (
    hours_mins_secs INT COMMENT 'hours, minutes, seconds from timestamp',
    severity STRING COMMENT 'The message severity',
    message STRING COMMENT 'The rest of the message'
);

修改表属性

ALTER TABLE log_messages SET TBLPROPERTIES (
    'notes' = 'The process id is no longer captured; this column is always NULL'
);

修改存储属性

ALTER TABLE log_messages
PARTITION(year=2000, month=1, day=1)
SET FILEFORMAT SEQUENCEFILE;

www.htsjk.Com true http://www.htsjk.com/hive/26403.html NewsArticle 【Hive】Hive编程指南, 第一章 基础知识 Hive MapReduce计算框架 第二章 基础操作 安装Hive $ cd /your/hive/directory/$ tar -xzf hive- 0.9 . 0 .tar.gz $ sudo echo "export HIVE_HOME= $PWD /hive-0.9.0" /etc/profile.d/hiv...
相关文章
    暂无相关文章
评论暂时关闭