大数据_HBase，

和通数据库htsjk.Com2019-11-21 23:22 来源:未知阅读:4235 评论 122 热度4

标签：

大数据_HBase，




一、NoSQL的基础、常见的NoSQL数据库

二、HBase的表结构和体系结构
    1、HBase的表结构：Google的三篇论文的时候：BigTable 大表
    2、HBase在ZK中保存数据
        （*）配置信息、HBase集群结构信息
        （*）表的元信息
        （*）实现HBase的HA：high avaibility 高可用性

三、搭建HBase环境

        （1）解压: tar -zxvf hbase-1.3.1-bin.tar.gz -C ~/training/
        （2）设置环境变量  vi ~/.bash_profile
                HBASE_HOME=/root/training/hbase-1.3.1
                export HBASE_HOME

                PATH=$HBASE_HOME/bin:$PATH
                export PATH

    1、本地模式： 不需要HDFS、直接把数据存在操作系统
        设置JAVA_HOME
        hbase-env.sh
                28 export JAVA_HOME=/root/training/jdk1.8.0_144
        设置保存文件目录    
        hbase-site.xml
                <property>
                   <name>hbase.rootdir</name>
                   <value>file:///root/training/hbase-1.3.1/data</value>
                </property>

    2、伪分布模式
        使用自带的zk，设置参数为true
        hbase-env.sh
                129 export HBASE_MANAGES_ZK=true

        hbase-site.xml
                <property>
                   <name>hbase.rootdir</name>
                   <value>hdfs://192.168.157.11:9000/hbase</value>
                </property>

                <property>
                   <name>hbase.cluster.distributed</name>
                   <value>true</value>
                </property>

                <property>
                   <name>hbase.zookeeper.quorum</name>
                   <value>192.168.157.11</value>
                </property>
                设置数据的冗余度
                <property>
                   <name>dfs.replication</name>
                   <value>1</value>
                </property>         
        从节点的地址
        regionservers
                192.168.157.11


    3、全分布模式(保证每台机器，时间的同步)
        hbase-site.xml
            <property>
               <name>hbase.rootdir</name>
               <value>hdfs://192.168.157.12:9000/hbase</value>
            </property>

            <property>
               <name>hbase.cluster.distributed</name>
               <value>true</value>
            </property>

            <property>
               <name>hbase.zookeeper.quorum</name>
               <value>192.168.157.12</value>
            </property>

            <property>
               <name>dfs.replication</name>
               <value>2</value>
            </property>         

            <property>
               <name>hbase.master.maxclockskew</name>
               <value>180000</value>
            </property>                 
        设置从节点的ip地址      
        regionservers
                192.168.157.13
                192.168.157.14
        复制到两个从节点上       
        scp -r hbase-1.3.1/ root@bigdata13:/root/training
        scp -r hbase-1.3.1/ root@bigdata14:/root/training

    4、HBase的HA
        不需要额外配置，只用在其中一个从节点上单点启动Hmaster
        bigdata13:hbase-daemon.sh start master

四、HBase在ZK中保存的数据和HA

五、操作HBase
    1、Web Console网页：端口：16010  （早期的版本中：60010）

    2、命令行
        （*）创建表： create 'students','info','grade'
                      list
                      查看表结构
                      desc 'students'
                      describe 'students'

                      问题：在Oracle数据库中，desc emp和describe emp有区别吗？
                            desc 是SQL*PLUS语句，可以缩写
                            describe 是SQL语句

                            SQL*PLUS语句和SQL语句有什么区别？
                            SQL*PLUS语句，可以缩写；
                            SQL语句，不可以缩写。

        （*）插入数据：put
        （*）查询数据：
               scan  相当于：select * from students

               get   相当于  select * from students where rowkey=??

        （*）清空表中的数据
              问题：在Oracle数据库，清空表的数据有几种方式？  delete和truncate语句
                    delete和truncate语句什么区别？
                    1、delete是DML（可以回滚），truncate是DDL（不可以回滚）
                    2、delete会产生碎片；truncate不会
                    3、delete不会释放空间；truncate会
                    4、delete可以闪回（flashback），truncate不可以闪回


                truncate 'students' -----> 本质： 先删除表，再重建

                日志：
                Truncating 'students' table (it may take a while):
                 - Disabling table...
                 - Truncating table...
                0 row(s) in 4.0840 seconds

                旧版本HBase
                Truncating 'students' table (it may take a while):
                 - Disabling table...   
                 - Dropping table...    
                 - Creating table ...   

      （*）删除表：disable 'students'
                     drop 'students'

    3、Java API（HBase）


六、数据保存的过程（一定注意：Region分裂）

七、HBase的过滤器（Java程序）

八、HBase上的MapReduce

HBase的表结构

HBase体系结构

HBase在ZK中保存的数据和HA

students表的结构

Java API 操作HBase

package demo;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.MasterNotRunningException;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.ZooKeeperConnectionException;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.Test;

/*
 * 1. 需要一个jar包 :  hamcrest-core-1.3.jar
 * 2. 修改windows host 文件
 *    C:\Windows\System32\drivers\etc\hosts 
 *    192.168.157.11 bigdata11
 */
public class TestHBase {

    @Test
    public void testCreateTable() throws Exception{
        //配置ZK的地址信息
        Configuration conf = new Configuration();
        conf.set("hbase.zookeeper.quorum", "192.168.157.11");

        //得到HBase的客户端
        HBaseAdmin client = new HBaseAdmin(conf);

        //创建表的描述符
        HTableDescriptor htd = new HTableDescriptor(TableName.valueOf("mytable"));

        //创建列族
        htd.addFamily(new HColumnDescriptor("info"));
        htd.addFamily(new HColumnDescriptor("grade"));

        //创建表
        client.createTable(htd);

        client.close();
    }

    @Test
    public void testPut() throws Exception{
        //配置ZK的地址信息
        Configuration conf = new Configuration();
        conf.set("hbase.zookeeper.quorum", "192.168.157.11");   

        //得到HBase的客户端
        HTable client = new HTable(conf,"mytable");

        //构造一个Put对象, 参数：rowkey
        Put put = new Put(Bytes.toBytes("id001"));
//      put.addColumn(family,     列族
//                    qualifier,  列
//                    value)      值

        put.addColumn(Bytes.toBytes("info"), Bytes.toBytes("name"), Bytes.toBytes("Tom"));

        client.put(put);
        //一次插入多条记录：client.put(List<Put>);

        client.close();
    }

    @Test
    public void testGet() throws Exception{
        //配置ZK的地址信息
        Configuration conf = new Configuration();
        conf.set("hbase.zookeeper.quorum", "192.168.157.11");   

        //得到HBase的客户端
        HTable client = new HTable(conf,"mytable");

        //构造一个Get对象，指定rowkey
        Get get = new Get(Bytes.toBytes("id001"));

        //查询
        Result r = client.get(get);

        //取出数据
        String name = Bytes.toString(r.getValue(Bytes.toBytes("info"), Bytes.toBytes("name")));
        System.out.println(name);

        client.close();
    }

    @Test
    public void testScan() throws Exception{
        //配置ZK的地址信息
        Configuration conf = new Configuration();
        conf.set("hbase.zookeeper.quorum", "192.168.157.11");   

        //得到HBase的客户端
        HTable client = new HTable(conf,"mytable");

        //定义一个扫描器
        Scan scan = new Scan();
        //scan.setFilter(filter) 定义一个过滤器

        //通过扫描器查询数据
        ResultScanner rs = client.getScanner(scan);

        for(Result r:rs){
            String name = Bytes.toString(r.getValue(Bytes.toBytes("info"), Bytes.toBytes("name")));
            System.out.println(name);           
        }

        client.close();
    }
}