【HBase】HBase的环境搭建及基本使用,hbase环境搭建
1、HBase体系结构
2、HBase功能
HBase是一种Hadoop 数据库,用于存储数据和检索数据。与RDBMS 相比,HBase可以存储海量数据,数据条目数可达上亿条,可以准实时检索,检索的速度达到秒级别。HBase是基于HDFS的,具有HDFS的优势:存在多个副本,数据安全性高,普通商用PC或Server就可以,而RDBMS的服务器都很贵。
3、HBase表的设计
HBase是一种列式存储的数据库,也是一种NOSQL数据库(NOSQL = Not Only SQL),每一列可以存放多个版本的值,表中每条数据有唯一的标识符,即rowkey,就是这一条数据的主键。
每条数据的构成格式:rowkey + columnfamily + column01 + timestamp : value => cell。cell中用字节数组进行存储,可使用工具类Bytes进行字节数组和其他类型的转换。
4、HBase的安装
(1)进入/opt/software/目录,将hbase安装包上传虚拟机。
(2)对HBase安装包赋予执行权限:
software]$ chmod u+x hbase-0.98.6-hadoop2-bin.tar.gz
(3)解压HBase安装包:
software]$ tar -zxf hbase-0.98.6-hadoop2-bin.tar.gz -C /opt/modules/
(4)进入/opt/modules/hadoop-2.5.0目录,启动namenode和datanode。
(5)修改配置文件/opt/modules/hbase-0.98.6-hadoop2/conf/hbase-site.xml。
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://hadoop-senior.ibeifeng.com:8020/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hadoop-senior.ibeifeng.com</value>
</property>
</configuration>
(6)修改配置文件/opt/modules/hbase-0.98.6-hadoop2/conf/hbase-env.sh。
export JAVA_HOME=/opt/modules/jdk1.7.0_67
# export HBASE_MANAGES_ZK=true
(7)修改配置文件/opt/modules/hbase-0.98.6-hadoop2/conf/regionservers。
hadoop-senior.ibeifeng.com
(8)进入/opt/modules/hbase-0.98.6-hadoop2/lib目录,hbase-0.98.6默认hadoop-2.2.0,换成我使用的hadoop版本hadoop-2.5.0。删除lib目录下的hadoop-2.2.0版本的所有jar包(以hadoop开头的所有jar包都删除),上传hadoop-2.5.0版本,并将zookeeper-3.4.6.jar替换为zookeeper-3.4.5.jar:
[beifeng@hadoop-senior lib]$ rm -rf ./hadoop-annotations-2.2.0.jar
[beifeng@hadoop-senior lib]$ rm -rf ./hadoop-auth-2.2.0.jar
[beifeng@hadoop-senior lib]$ rm -rf ./hadoop-common-2.2.0.jar
[beifeng@hadoop-senior lib]$ rm -rf ./hadoop-hdfs-2.2.0.jar
[beifeng@hadoop-senior lib]$ rm -rf ./hadoop-mapreduce-client-app-2.2.0.jar
[beifeng@hadoop-senior lib]$ rm -rf ./hadoop-mapreduce-client-common-2.2.0.jar
[beifeng@hadoop-senior lib]$ rm -rf ./hadoop-mapreduce-client-core-2.2.0.jar
[beifeng@hadoop-senior lib]$ rm -rf ./hadoop-mapreduce-client-jobclient-2.2.0.jar
[beifeng@hadoop-senior lib]$ rm -rf ./hadoop-mapreduce-client-shuffle-2.2.0.jar
[beifeng@hadoop-senior lib]$ rm -rf ./hadoop-yarn-api-2.2.0.jar
[beifeng@hadoop-senior lib]$ rm -rf ./hadoop-yarn-client-2.2.0.jar
[beifeng@hadoop-senior lib]$ rm -rf ./hadoop-yarn-common-2.2.0.jar
[beifeng@hadoop-senior lib]$ rm -rf ./hadoop-yarn-server-common-2.2.0.jar
[beifeng@hadoop-senior lib]$ rm -rf ./hadoop-yarn-server-nodemanager-2.2.0.jar
[beifeng@hadoop-senior lib]$ rm -rf ./hadoop-client-2.2.0.jar
[beifeng@hadoop-senior lib]$ rm -rf ./zookeeper-3.4.6.jar
(9)hbase启动方式之一:进入/opt/modules/hbase-0.98.6-hadoop2目录,启动hbase进程,使用hbase自带的zookeeper(我们已经将zookeeper-3.4.6.jar替换为zookeeper-3.4.5.jar):
hbase-0.98.6-hadoop2]$ bin/start-hbase.sh
查看hbase进程:
[beifeng@hadoop-senior hbase-0.98.6-hadoop2]$ jps
2813 HRegionServer
3162 Jps
2724 HMaster
2670 HQuorumPeer
2196 DataNode
2137 NameNode
(10)hbase启动方式之二:启动我们自己安装的zookeeper,并分别启动master和regionserver:
zookeeper-3.4.5]$ bin/zkServer.sh start
hbase-0.98.6-hadoop2]$ bin/hbase-daemon.sh start master
hbase-0.98.6-hadoop2]$ bin/hbase-daemon.sh start regionserver
查看hbase进程:
[beifeng@hadoop-senior hbase-0.98.6-hadoop2]$ jps
6283 QuorumPeerMain
6483 Jps
6334 HMaster
2196 DataNode
2137 NameNode
6431 HRegionServer
(11)停止hbase进程:
hbase-0.98.6-hadoop2]$ bin/stop-hbase.sh
5、HBase的基本使用
(1)启动hbase shell命令行:
hbase-0.98.6-hadoop2]$ bin/hbase shell
(2)列出hbase中的表:
hbase(main):001:0> list
TABLE
2018-07-22 11:46:58,921 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
0 row(s) in 3.0660 seconds
=> []
(3)创建表,表名user,列簇info:
hbase(main):002:0> create 'user','info'
0 row(s) in 0.6260 seconds
=> Hbase::Table - user
(4)查询表user的信息:
hbase(main):003:0> describe 'user'
DESCRIPTION ENABLED
'user', {NAME => 'info', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICA true
TION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL =
> 'FOREVER', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false
', BLOCKCACHE => 'true'}
1 row(s) in 0.0700 seconds
(5)向表user中插入数据。表名user,rowkey为10001,列簇info,列名name等,cell值为zhangsan:
hbase(main):004:0> put 'user','10001','info:name','zhangsan'
hbase(main):005:0> put 'user','10001','info:age','25'
hbase(main):006:0> put 'user','10001','info:sex','male'
hbase(main):007:0> put 'user','10001','info:address','shanghai'
HBase中的数据查询有三种方式:
1)依据rowkey查询,这是最快的,使用get命令;
2)依据范围查询,这是最常用的,使用scan range命令;
3)全表扫描,这是最慢的,使用scan命令。
(6)查询user表中列簇为10001的信息:
hbase(main):008:0> get 'user','10001'
COLUMN CELL
info:address timestamp=1532231767144, value=shanghai
info:age timestamp=1532231729180, value=25
info:name timestamp=1532231687833, value=zhangsan
info:sex timestamp=1532231746853, value=male
4 row(s) in 0.0300 seconds
查询user表中列簇为10001,列名为name的信息:
hbase(main):009:0> get 'user','10001','info:name'
COLUMN CELL
info:name timestamp=1532231687833, value=zhangsan
1 row(s) in 0.0160 seconds
(7)插入rowkey为10002的信息:
hbase(main):010:0> put 'user','10002','info:name','wangwu'
hbase(main):011:0> put 'user','10002','info:age','30'
hbase(main):012:0> put 'user','10002','info:tel','25354212'
hbase(main):013:0> put 'user','10002','info:qq','232523551'
全表扫描user表:
hbase(main):014:0> scan 'user'
ROW COLUMN+CELL
10001 column=info:address, timestamp=1532231767144, value=shanghai
10001 column=info:age, timestamp=1532231729180, value=25
10001 column=info:name, timestamp=1532231687833, value=zhangsan
10001 column=info:sex, timestamp=1532231746853, value=male
10002 column=info:age, timestamp=1532232249589, value=30
10002 column=info:name, timestamp=1532232223162, value=wangwu
10002 column=info:qq, timestamp=1532232294714, value=232523551
10002 column=info:tel, timestamp=1532232273419, value=25354212
2 row(s) in 0.0450 seconds
(8)插入user表中列簇为10003的信息:
hbase(main):015:0> put 'user','10003','info:name','zhaoliu'
(9)范围查询:查询user表中的name列和age列的信息:
hbase(main):016:0> scan 'user',{COLUMNS => ['info:name','info:age']}
ROW COLUMN+CELL
10001 column=info:age, timestamp=1532231729180, value=25
10001 column=info:name, timestamp=1532231687833, value=zhangsan
10002 column=info:age, timestamp=1532232249589, value=30
10002 column=info:name, timestamp=1532232223162, value=wangwu
10003 column=info:name, timestamp=1532232516020, value=zhaoliu
3 row(s) in 0.0410 seconds
(10)范围查询:查询user表中起始rowkey为10002开始的行信息:
hbase(main):017:0> scan 'user', {STARTROW=>'10002'}
ROW COLUMN+CELL
10002 column=info:age, timestamp=1532232249589, value=30
10002 column=info:name, timestamp=1532232223162, value=wangwu
10002 column=info:qq, timestamp=1532232294714, value=232523551
10002 column=info:tel, timestamp=1532232273419, value=25354212
10003 column=info:name, timestamp=1532232516020, value=zhaoliu
2 row(s) in 0.0340 seconds
(11)删除user表中rowkey为10001,列簇为info,列名为name的列数据:
hbase(main):018:0> delete 'user','10001','info:name'
(12)全表扫描user表:
hbase(main):019:0> scan 'user'
ROW COLUMN+CELL
10001 column=info:address, timestamp=1532231767144, value=shanghai
10001 column=info:age, timestamp=1532231729180, value=25
10001 column=info:sex, timestamp=1532231746853, value=male
10002 column=info:age, timestamp=1532232249589, value=30
10002 column=info:name, timestamp=1532232223162, value=wangwu
10002 column=info:qq, timestamp=1532232294714, value=232523551
10002 column=info:tel, timestamp=1532232273419, value=25354212
10003 column=info:name, timestamp=1532232516020, value=zhaoliu
3 row(s) in 0.0340 seconds
(13)删除user表中rowkey为10001的全部信息:
hbase(main):020:0> deleteall 'user','10001'
全表扫描user表:
hbase(main):021:0> scan 'user'
ROW COLUMN+CELL
10002 column=info:age, timestamp=1532232249589, value=30
10002 column=info:name, timestamp=1532232223162, value=wangwu
10002 column=info:qq, timestamp=1532232294714, value=232523551
10002 column=info:tel, timestamp=1532232273419, value=25354212
10003 column=info:name, timestamp=1532232516020, value=zhaoliu
2 row(s) in 0.0230 seconds
(14)禁用user表:
hbase(main):022:0> disable 'user'
(15)启用user表:
hbase(main):023:0> enable 'user'
(16)删除user表:
hbase(main):024:0> drop 'user'
(17)退出hbase shell命令行:
hbase(main):025:0> exit