大数据技术-HBase：使用CopyTable在线备份HBase表数据，

和通数据库htsjk.Com2019-07-31 00:07 来源:未知阅读:9538 评论 140 热度2

标签：

大数据技术-HBase：使用CopyTable在线备份HBase表数据，

CopyTable是hbase提供的一个很有用的备份工具。主要可以用于集群内部表备份，远程集群备份，表数据增量备份，部分结构数据部分等。其依赖于hadoop mapreduce，使用标准的hbase scan读接口和put写接口。

使用之前，请务必先在集群中创建好需要写入的目标表tableDst，不然会报错，同时注意对于在备份期间新写入的数据无法保证都进行复制到目标表中。

# create new tableOrig on destination cluster dstCluster$ echo "create 'tableOrig', 'cf1', 'cf2'" | hbase shell # on source cluster run copy table with destination ZK quorum specified using --peer.adr # WARNING: In older versions, you are not alerted about any typo in these arguments! srcCluster$ hbase org.apache.hadoop.hbase.mapreduce.CopyTable --peer.adr=dstClusterZK:2181:/hbase tableOrig

# create new tableCopy on destination cluster dstCluster$ echo "create 'tableCopy', 'cf1', 'cf2'" | hbase shell # on source cluster run copy table with destination --peer.adr and --new.name arguments. srcCluster$ hbase org.apache.hadoop.hbase.mapreduce.CopyTable --peer.adr=dstClusterZK:2181:/hbase --new.name=tableCopy tableOrig

# WARNING: In older versions, you are not alerted about any typo in these arguments! # copy from beginning of time until timeEnd # NOTE: Must include start time for end time to be respected. start time cannot be 0. srcCluster$ hbase org.apache.hadoop.hbase.mapreduce.CopyTable ... --starttime=1 --endtime=timeEnd ... # Copy from starting from and including timeStart until the end of time. srcCluster$ hbase org.apache.hadoop.hbase.mapreduce.CopyTable ... --starttime=timeStart ... # Copy entries rows with start time1 including time1 and ending at timeStart excluding timeEnd. srcCluster$ hbase org.apache.hadoop.hbase.mapreduce.CopyTable ... --starttime=timestart --endtime=timeEnd

Usage: CopyTable [general options] [--starttime=X] [--endtime=Y] [--new.name=NEW] [--peer.adr=ADR] <tablename>

Options:
rs.class hbase.regionserver.class of the peer cluster
specify if different from current cluster
rs.impl hbase.regionserver.impl of the peer cluster
startrow the start row
stoprow the stop row
starttime beginning of the time range (unixtime in millis)
without endtime means from starttime to forever
endtime end of the time range. Ignored if no starttime specified.
versions number of cell versions to copy
new.name new table's name
peer.adr Address of the peer cluster given in the format
hbase.zookeeer.quorum:hbase.zookeeper.client.port:zookeeper.znode.parent
families comma-separated list of families to copy
To copy from cf1 to cf2, give sourceCfName:destCfName.
To keep the same name, just give "cfName"
all.cells also copy delete markers and deleted cells

Args:
tablename Name of the table to copy

Examples:
To copy 'TestTable' to a cluster that uses replication for a 1 hour window:
$ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable --starttime=1265875194289 --endtime=1265878794289 --peer.adr=server1,server2,server3:2181:/hbase --families=myOldCf:myNewCf,cf2,cf3 TestTable
For performance consider the following general options:
-Dhbase.client.scanner.caching=100
-Dmapred.map.tasks.speculative.execution=false