hadoop18--JavaAPI, 读写流程, 物理模型，

和通数据库htsjk.Com2019-04-12 09:26 来源:未知阅读:3049 评论 400 热度3

标签：

hadoop18--JavaAPI, 读写流程, 物理模型，

hbase的物理模型

hbase的物理模型, 就是说在hbase中数据是如何存储的, 以及存储的位置和原理

在hbase中最小的存储单元是cell(单元格): rowkey + 列簇 + 时间戳 + value, 可以唯一确定一个单元格的值

在hbase物理模型中, rowkey所在行的方向上可以划分多个region, 并且是按照字典排序的

region可以理解为分区的概念, 一个region 里边会存在一个列簇+value

hbase中存在系统管理表

hbase(main):017:0> scan 'hbase:namespace'
ROW                                        COLUMN+CELL
 default                                   column=info:d, timestamp=1542080423695, value=\x0A\x07default
 hbase                                     column=info:d, timestamp=1542080424908, value=\x0A\x05hbase
2 row(s) in 0.0150 seconds

hbase:meta: 记录了所有表的region信息

hbase(main):020:0> scan 'hbase:meta'
ROW                                        COLUMN+CELL
 hadoop,,1542146532516.edec440c50dc8b6a571 column=info:regioninfo, timestamp=1542146534724, value={ENCODED => edec440c50dc8b6a571ddbe678ce0c9c, NAME => 'hadoop,,1542
 ddbe678ce0c9c.                            146532516.edec440c50dc8b6a571ddbe678ce0c9c.', STARTKEY => '', ENDKEY => ''}
 hadoop,,1542146532516.edec440c50dc8b6a571 column=info:seqnumDuringOpen, timestamp=1542146534724, value=\x00\x00\x00\x00\x00\x00\x00\x02
 ddbe678ce0c9c.
 hadoop,,1542146532516.edec440c50dc8b6a571 column=info:server, timestamp=1542146534724, value=hadoop02:16020
 ddbe678ce0c9c.
 hadoop,,1542146532516.edec440c50dc8b6a571 column=info:serverstartcode, timestamp=1542146534724, value=1542142878250
 ddbe678ce0c9c.
 hbase:namespace,,1542080421925.833f0607b5 column=info:regioninfo, timestamp=1542143164503, value={ENCODED => 833f0607b5229ac2884356c723234928, NAME => 'hbase:namesp
 229ac2884356c723234928.                   ace,,1542080421925.833f0607b5229ac2884356c723234928.', STARTKEY => '', ENDKEY => ''}
 hbase:namespace,,1542080421925.833f0607b5 column=info:seqnumDuringOpen, timestamp=1542143164503, value=\x00\x00\x00\x00\x00\x00\x00
 229ac2884356c723234928.
 hbase:namespace,,1542080421925.833f0607b5 column=info:server, timestamp=1542143164503, value=hadoop03:16020
 229ac2884356c723234928.
 hbase:namespace,,1542080421925.833f0607b5 column=info:serverstartcode, timestamp=1542143164503, value=1542142884167
 229ac2884356c723234928.
2 row(s) in 0.0440 seconds

hbase的读取流程

hbase的写入流程

在hbase上对于文件的删除

先给要删除的文件数据打上删除标记, 在合并的过程中, 进行删除, 例如版本号较老的数据, 在hbase上会存在保存时间, 当达到一定时间之后, 版本老的数据将会在合并的过程中删除

hbase的Java API

对于hbase的API操作, 其实就是调用hbase中的接口, 通过java进行数据库的操作, 类似于jdbc的操作

@Test
public void createNamespace() throws IOException {
    // 获得配置信息
    Configuration conf = HBaseConfiguration.create();
    // 获得连接
    Connection connection = ConnectionFactory.createConnection(conf);
    // 获得管理
    Admin admin = connection.getAdmin();
    // namespace描述
    NamespaceDescriptor nsDesc = NamespaceDescriptor.create("hadoop").build();
    // 创建namespace
    admin.createNamespace(nsDesc);
}

创建表, 以及列簇

@Test
public void creaetTable() throws IOException {
    // 获得配置信息
    Configuration conf = HBaseConfiguration.create();
    // 获得连接
    Connection connection = ConnectionFactory.createConnection(conf);
    // 获得管理
    Admin admin = connection.getAdmin();
    // 添加表名
    TableName tableName = TableName.valueOf("hadoop:student");
    // 添加表描述
    HTableDescriptor hTableDesc = new HTableDescriptor(tableName);
    // 添加列簇描述
    HColumnDescriptor hcol = new HColumnDescriptor("f1");
    // 将列簇添加到表中
    hTableDesc.addFamily(hcol);
    // 创建表
    admin.createTable(hTableDesc);
}

删除表

@Test
public void delTable() throws IOException {
    // 获得配置信息
    Configuration conf = HBaseConfiguration.create();
    // 获得连接
    Connection connection = ConnectionFactory.createConnection(conf);
    // 获得管理
    Admin admin = connection.getAdmin();
    // 添加表名
    TableName tableName = TableName.valueOf("hadoop:student");
    // 禁用表
    admin.disableTable(tableName);
    // 删除表
    admin.deleteTable(tableName);
}

添加数据

@Test
public void putData() throws IOException {
    // 获得配置信息
    Configuration conf = HBaseConfiguration.create();
    // 获得连接
    Connection connection = ConnectionFactory.createConnection(conf);
    // 添加表名
    TableName tableName = TableName.valueOf("hadoop:student");
    // 获得表
    Table table = connection.getTable(tableName);
    // 添加rowkey
    Put put = new Put(Bytes.toBytes("1001"));
    // 要添加的数据
    put.addColumn(Bytes.toBytes("f1"), Bytes.toBytes("name"), Bytes.toBytes("zhangsan"));
    // 执行
    table.put(put);
}

删除数据

@Test
public void delData() throws IOException {
    // 获得配置信息
    Configuration conf = HBaseConfiguration.create();
    // 获得连接
    Connection connection = ConnectionFactory.createConnection(conf);
    // 添加表名
    TableName tableName = TableName.valueOf("hadoop:student");
    // 获得表
    Table table = connection.getTable(tableName);
    // 添加rowkey	
    Delete del = new Delete(Bytes.toBytes("1001"));
    // 执行删除
    table.delete(del);
}

查询数据

@Test
public void getData() throws IOException {
    // 获得配置信息
    Configuration conf = HBaseConfiguration.create();
    // 获得连接
    Connection connection = ConnectionFactory.createConnection(conf);
    // 添加表名
    TableName tableName = TableName.valueOf("hadoop:student");
    // 获得表
    Table table = connection.getTable(tableName);
    // 添加rowkey
    Get get = new Get(Bytes.toBytes("1001"));
    // 拿到值
    Result result = table.get(get);
    // 放到单元格里
    Cell[] cells = result.rawCells();
    // 遍历
    for (Cell cell : cells) {
    // 列簇
    System.out.println(Bytes.toString(CellUtil.cloneFamily(cell)));
    // 列名
    System.out.println(Bytes.toString(CellUtil.cloneQualifier(cell)));
    // rowkey
    System.out.println(Bytes.toString(CellUtil.cloneRow(cell)));
    // 值
    System.out.println(Bytes.toString(CellUtil.cloneValue(cell)));
    // 时间戳
    System.out.println(cell.getTimestamp());
    }
}

通过scan进行查询

@Test
public void scanDate() throws IOException {

    // 读取配置文件; 先读取默认的配置文件， 在读取自定义的配置文件
    Configuration conf = HBaseConfiguration.create();

    // 建立hbase的连接
    Connection connection = ConnectionFactory.createConnection(conf);

    TableName tableName = TableName.valueOf("hadoop25:stuinfo");

    Table table = connection.getTable(tableName);

    Scan scan = new Scan();

    // 当表中存在多个列簇的时候，可以指定列簇进行查询
    scan.addFamily(Bytes.toBytes("info"));

    //			scan.addColumn(family, qualifier)
    // family： 列簇
    // qualifier： 列
    ResultScanner scanner = table.getScanner(scan);

    for (Result result : scanner) {

    Cell[] cells = result.rawCells();

    for (Cell cell : cells) {

    // rowkey
    System.out.println(Bytes.toString(CellUtil.cloneRow(cell)));

    // 列簇
    System.out.println(Bytes.toString(CellUtil.cloneFamily(cell)));

    // 列
    System.out.println(Bytes.toString(CellUtil.cloneQualifier(cell)));

    // 数据内容
    System.out.println(Bytes.toString(CellUtil.cloneValue(cell)));

    // 时间戳
    System.out.println(cell.getTimestamp());
    }
    }

}