sqoop从hbase导出数据到mysql,sqoophbase
目前sqoop没有办法把数据直接从Hbase导出到mysql。必须要通过Hive建立2个表,一个外部表是基于这个Hbase表的,另一个是单纯的基于hdfs的hive原生表,然后把外部表的数据导入到原生表(临时),然后通过hive将临时表里面的数据导出到mysql
数据准备
mysql建立空表
?
| 1 2 3 4 5 6 |
CREATE
TABLE
`employee` (
`rowkey`
int(11)
NOT
NULL,
`id`
int(11)
NOT
NULL,
`name`
varchar(20)
NOT
NULL,
PRIMARY
KEY
(`id`)
)
ENGINE=MyISAM DEFAULT
CHARSET=utf8;
|
注意:因为大家习惯性的把hive表用于映射Hbase的rowkey的字段命名为key,所以在建立mysql的table的时候有可能也建立对应的key字段,但是key是mysql的保留字,会导致insert语句无法插入的问题
Hbase建立employee表
建立employee表,并插入数据 ?| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
hbase(main):005:0>
create
'employee','info'
0
row(s) in
0.4740 seconds
=>
Hbase::Table
- employee
hbase(main):006:0>
put 'employee',1,'info:id',1
0
row(s) in
0.2080 seconds
hbase(main):008:0>
scan 'employee'
ROW
COLUMN+CELL
1
column=info:id,
timestamp=1417591291730,
value=1
1
row(s) in
0.0610 seconds
hbase(main):009:0>
put 'employee',1,'info:name','peter'
0
row(s) in
0.0220 seconds
hbase(main):010:0>
scan 'employee'
ROW
COLUMN+CELL
1
column=info:id,
timestamp=1417591291730,
value=1
1
column=info:name,
timestamp=1417591321072,
value=peter
1
row(s) in
0.0450 seconds
hbase(main):011:0>
put 'employee',2,'info:id',2
0
row(s) in
0.0370 seconds
hbase(main):012:0>
put 'employee',2,'info:name','paul'
0
row(s) in
0.0180 seconds
hbase(main):013:0>
scan 'employee'
ROW
COLUMN+CELL
1
column=info:id,
timestamp=1417591291730,
value=1
1
column=info:name,
timestamp=1417591321072,
value=peter
2
column=info:id,
timestamp=1417591500179,
value=2
2
column=info:name,
timestamp=1417591512075,
value=paul
2
row(s) in
0.0440 seconds
|
建立Hive外部表
hive 有分为原生表和外部表,原生表是以简单文件方式存储在hdfs里面,外部表依赖别的框架,比如Hbase,我们现在建立一个依赖于我们刚刚建立的employee hbase表的hive 外部表?
| 1 2 3 4 5 6 7 8 9 10 11 |
hive>
CREATE
EXTERNAL TABLE
h_employee(key
int,
id int,
name
string)
>
STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>
WITH
SERDEPROPERTIES ("hbase.columns.mapping"
= ":key,
info:id,info:name")
>
TBLPROPERTIES ("hbase.table.name"
= "employee");
OK
Time
taken: 0.324 seconds
hive>
select
* from
h_employee;
OK
1
1 peter
2
2 paul
Time
taken: 1.129 seconds, Fetched: 2 row(s)
|
建立Hive原生表
这个hive原生表只是用于导出的时候临时使用的,所以取名叫 h_employee_export,字段之间的分隔符用逗号?
| 1 2 |
CREATE
TABLE
h_employee_export(key
INT,
id INT,
name
STRING)
ROW
FORMAT DELIMITED FIELDS TERMINATED BY
'\054';
|
我们去看下实际存储的文本文件是什么样子的
?
| 1 2 3 |
$
hdfs dfs -cat /user/hive/warehouse/h_employee_export/000000_0
1,1,peter
2,2,paul
|
源Hive表导入数据到临时表
第一步先将数据从 h_employee(基于Hbase的外部表)导入到 h_employee_export(原生Hive表)
?
| 1 |
hive>
insert
overwrite table
h_employee_export select
* from
h_employee;
|
| 1 2 3 4 5 |
hive>
select
* from
h_employee_export;
OK
1
1 peter
2
2 paul
Time
taken: 0.359 seconds, Fetched: 2 row(s)
|
我们去看下实际存储的文本文件长什么样子 ?
| 1 2 3 |
$
hdfs dfs -cat /user/hive/warehouse/h_employee_export/000000_0
1,1,peter
2,2,paul
|
从Hive导出数据到mysql
?
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
$
sqoop export --connect
jdbc:mysql://localhost:3306/sqoop_test
--username root --password root --table employee --m 1 --export-dir /user/hive/warehouse/h_employee_export/
Warning:
/usr/lib/sqoop/../hive-hcatalog does not
exist! HCatalog jobs will fail.
Please
set
$HCAT_HOME to
the root of
your HCatalog installation.
Warning:
/usr/lib/sqoop/../accumulo does not
exist! Accumulo imports will fail.
Please
set
$ACCUMULO_HOME to
the root of
your Accumulo installation.
14/12/05
08:49:35 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.0.1
14/12/05
08:49:35 WARN tool.BaseSqoopTool: Setting your password
on
the command-line is
insecure. Consider using -P instead.
14/12/05
08:49:35 INFO manager.MySQLManager: Preparing to
use a MySQL streaming resultset.
14/12/05
08:49:35 INFO tool.CodeGenTool: Beginning code generation
14/12/05
08:49:36 INFO manager.SqlManager: Executing SQL statement: SELECT
t.* FROM
`employee` AS
t LIMIT 1
14/12/05
08:49:36 INFO manager.SqlManager: Executing SQL statement: SELECT
t.* FROM
`employee` AS
t LIMIT 1
14/12/05
08:49:36 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is
/usr/lib/hadoop-mapreduce
Note:
/tmp/sqoop-wlsuser/compile/d16eb4166baf6a1e885d7df0e2638685/employee.java uses
or
overrides a deprecated API.
Note:
Recompile with
-Xlint:deprecation for
details.
14/12/05
08:49:39 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-wlsuser/compile/d16eb4166baf6a1e885d7df0e2638685/employee.jar
14/12/05
08:49:39 INFO mapreduce.ExportJobBase: Beginning export of
employee
14/12/05
08:49:41 INFO Configuration.deprecation: mapred.jar is
deprecated. Instead,
use mapreduce.job.jar
14/12/05
08:49:43 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution
is
deprecated. Instead,
use mapreduce.reduce.speculative
14/12/05
08:49:43 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution
is
deprecated. Instead,
use mapreduce.map.speculative
14/12/05
08:49:43 INFO Configuration.deprecation: mapred.map.tasks is
deprecated. Instead,
use mapreduce.job.maps
14/12/05
08:49:43 INFO client.RMProxy: Connecting to
ResourceManager at
hadoop01/192.111.78.111:8032
14/12/05
08:49:45 INFO input.FileInputFormat: Total input paths to
process : 1
14/12/05
08:49:45 INFO input.FileInputFormat: Total input paths to
process : 1
14/12/05
08:49:45 INFO mapreduce.JobSubmitter: number of
splits:1
14/12/05
08:49:46 INFO mapreduce.JobSubmitter: Submitting tokens for
job: job_1406097234796_0037
14/12/05
08:49:46 INFO impl.YarnClientImpl: Submitted application application_1406097234796_0037
14/12/05
08:49:46 INFO mapreduce.Job: The url to
track the job:
http://hadoop01:8088/proxy/application_1406097234796_0037/
14/12/05
08:49:46 INFO mapreduce.Job: Running job: job_1406097234796_0037
14/12/05
08:49:59 INFO mapreduce.Job: Job job_1406097234796_0037 running in
uber mode : false
14/12/05
08:49:59 INFO mapreduce.Job: map 0% reduce 0%
14/12/05
08:50:10 INFO mapreduce.Job: map 100% reduce 0%
14/12/05
08:50:10 INFO mapreduce.Job: Job job_1406097234796_0037 completed successfully
14/12/05
08:50:10 INFO mapreduce.Job: Counters: 30
File
System Counters
FILE:
Number of
bytes read=0
FILE:
Number of
bytes written=99761
FILE:
Number of
read
operations=0
FILE:
Number of
large read
operations=0
FILE:
Number of
write operations=0
HDFS:
Number of
bytes read=166
HDFS:
Number of
bytes written=0
HDFS:
Number of
read
operations=4
HDFS:
Number of
large read
operations=0
HDFS:
Number of
write operations=0
Job
Counters
Launched
map tasks=1
Data-local
map tasks=1
Total
time
spent by
all
maps in
occupied slots (ms)=8805
Total
time
spent by
all
reduces in
occupied slots (ms)=0
Total
time
spent by
all
map tasks (ms)=8805
Total
vcore-seconds taken by
all
map tasks=8805
Total
megabyte-seconds taken by
all
map tasks=9016320
Map-Reduce
Framework
Map
input records=2
Map
output
records=2
Input
split bytes=144
Spilled
Records=0
Failed
Shuffles=0
Merged
Map outputs=0
GC
time
elapsed (ms)=97
CPU
time
spent (ms)=1360
Physical
memory (bytes) snapshot=167555072
Virtual
memory (bytes) snapshot=684212224
Total
committed
heap usage (bytes)=148897792
File
Input Format Counters
Bytes
Read=0
File
Output
Format Counters
Bytes
Written=0
14/12/05
08:50:10 INFO mapreduce.ExportJobBase: Transferred 166 bytes in
27.0676 seconds (6.1328 bytes/sec)
14/12/05
08:50:10 INFO mapreduce.ExportJobBase: Exported 2 records.
|
注意
在这段日志中有这样一句话 ?
| 1 |
14/12/05
08:49:46 INFO mapreduce.Job: The url to
track the job:
http://hadoop01:8088/proxy/application_1406097234796_0037/
|
意思是你可以用浏览器访问这个地址去看下任务的执行情况,如果你的任务长时间卡主没结束就是出错了,可以去这个地址查看详细的错误日志
查看结果 ?
| 1 2 3 4 5 6 7 8 9 10 |
mysql>
select
* from
employee;
+--------+----+-------+
|
rowkey | id | name
|
+--------+----+-------+
|
1 | 1 | peter |
|
2 | 2 | paul |
+--------+----+-------+
2
rows
in
set
(0.00 sec)
mysql>
|
导入成功
本站文章为和通数据库网友分享或者投稿,欢迎任何形式的转载,但请务必注明出处.
同时文章内容如有侵犯了您的权益,请联系QQ:970679559,我们会在尽快处理。