大数据时代--Hive实战之Hive命令，

和通数据库htsjk.Com2019-08-27 01:09 来源:未知阅读:2982 评论 62 热度4

标签：

大数据时代--Hive实战之Hive命令，

转载请注明出处：大数据时代--Hive实战之Hive命令

接下来几天，先重温下Hive的基本命令，然后再以某个场景为前提进行一定的开发与设计。本文作为自己温故知新的文章，同时希望看到的朋友可以从中学到些Hive的基础知识，达到入门的目的。

设置变量

set foot = "wy";

set foot;
foot="wy"

这里的变量是普通的。

下面设置Hive用户自定义变量。

 ./hive --define foot=name;

Logging initialized using configuration in jar:file:/usr/local/apache-hive-0.13.0-bin/lib/hive-common-0.13.0.jar!/hive-log4j.properties
hive> set hivevar:foot;
hivevar:foot=name

那么接下来可以使用一下，上面设置的Hivevar也就是刚刚设置好的用户自定义的变量。

create table wy (id bigint, ${hivevar:foot} string);
OK
Time taken: 2.282 seconds

使用用户自定义的变量创建一张表。

查看表的属性

describe wy;
OK
id                      bigint
name                    string
Time taken: 0.349 seconds, Fetched: 2 row(s)

删除表

drop table wy;
OK
Time taken: 0.797 seconds

Hive相关配置属性

hive --hiveconf hive.cli.print.current.db=true;

Logging initialized using configuration in jar:file:/usr/local/apache-hive-0.13.0-bin/lib/hive-common-0.13.0.jar!/hive-log4j.properties
hive (default)>

这里已经显示了默认的数据库的名称，由--hiveconf参数进行的设置。

hive (default)> set hiveconf:hive.cli.print.current.db = false;
hive>

将hiveconf的输出当前数据库的配置设置回false。当然我个人觉得还是显示的更好。

这里设置一个配置的变量值为1，在查询语句中可以使用这个变量。这里的配置变量其实也可以用上面的那个用户自定义变量。

hive (default)> set hiveconf:id = 1;
hive (default)> select * from wt where id = ${hiveconf:id};
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1472122943986_0001, Tracking URL = http://hadoopwy1:8088/proxy/application_1472122943986_0001/
Kill Command = /usr/local/hadoop2/bin/hadoop job  -kill job_1472122943986_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2016-08-25 05:23:22,904 Stage-1 map = 0%,  reduce = 0%
2016-08-25 05:23:56,987 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.07 sec
MapReduce Total cumulative CPU time: 2 seconds 70 msec
Ended Job = job_1472122943986_0001
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 2.07 sec   HDFS Read: 227 HDFS Write: 5 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 70 msec
OK
1       wy
Time taken: 71.776 seconds, Fetched: 1 row(s)

用户自定义变量实现上面的功能

 set hivevar:num = 1;
hive (default)> select * from wt where id = ${hivevar:num};

Hive执行指定文件中的语句

hive -f ../../dataFileTemp/testQuery.hql

Logging initialized using configuration in jar:file:/usr/local/apache-hive-0.13.0-bin/lib/hive-common-0.13.0.jar!/hive-log4j.properties
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1472122943986_0003, Tracking URL = http://hadoopwy1:8088/proxy/application_1472122943986_0003/
Kill Command = /usr/local/hadoop2/bin/hadoop job  -kill job_1472122943986_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2016-08-25 05:40:58,112 Stage-1 map = 0%,  reduce = 0%
2016-08-25 05:41:16,951 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.78 sec
MapReduce Total cumulative CPU time: 1 seconds 780 msec
Ended Job = job_1472122943986_0003
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 1.78 sec   HDFS Read: 227 HDFS Write: 5 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 780 msec
OK
1       wy
Time taken: 42.905 seconds, Fetched: 1 row(s)
[root@hadoopwy2 bin]#

这是一种方式，另外可以在shell里面直接执行

source /usr/local/dataFileTemp/testQuery.hql

Hive中使用HDFS中命令

hive> dfs -ls /;
Found 3 items
drwxr-xr-x   - root supergroup          0 2016-03-19 06:49 /hbase
drwx------   - root supergroup          0 2016-08-24 05:17 /tmp
drwxr-xr-x   - root supergroup          0 2016-08-24 05:04 /user

Hive数据类型和文件格式

记得在前面的博文中已经写到过Hive的基本数据类型，但是为了再一次的熟悉下，还是要提一下。

基本类型

整型：TINYINT(1个字节)、SMALLINT(2个字节)、INT(4个自己)、BIGINT(8个字节)。

BOOLEAN:布尔类型值取true或者false。

浮点：FLOAT:单精度浮点数、DOUBLE:双精度浮点数。

STRING:字符序列，可以使用单引号或者双引号。

TIMESTAMP:可以取值为整数、浮点数或者字符串。

BINARY：字节数组。

集合数据类型

STRUCT：和C语言类似。他的结构比如：STRUCT｛first STRING， last STRING｝对应的数据为STRUCT('wy', 'wyy')，可以通过字段名.first 来访问一个元素

MAP:键值对集合，可以通过字段名['key'] 来访问元素。map('first', 'wy', 'last', 'wyy')

ARRAY:数组具有相同类型和名称的变量的集合，其结构大致为['wy', 'wyy'],编号是从0开始的，那么第2个元素访问的时候可以通过数组[1]进行访问。Array('wy', 'wyy')

转载请注明出处：大数据时代--Hive实战之Hive命令