sqoop从hbase导出数据到mysql，sqoophbase

和通数据库htsjk.Com2019-07-02 03:20 来源:未知阅读:16680 评论 458 热度2

标签：

sqoop从hbase导出数据到mysql，sqoophbase

目前sqoop没有办法把数据直接从Hbase导出到mysql。必须要通过Hive建立2个表，一个外部表是基于这个Hbase表的，另一个是单纯的基于hdfs的hive原生表，然后把外部表的数据导入到原生表（临时），然后通过hive将临时表里面的数据导出到mysql

数据准备

mysql建立空表

1 2 3 4 5 6 CREATE TABLE `employee` ( `rowkey`int(11)NOT NULL, `id`int(11)NOT NULL, `name`varchar(20)NOT NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8;

注意：因为大家习惯性的把hive表用于映射Hbase的rowkey的字段命名为key，所以在建立mysql的table的时候有可能也建立对应的key字段，但是key是mysql的保留字，会导致insert语句无法插入的问题

Hbase建立employee表

建立employee表，并插入数据 ?

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 hbase(main):005:0>create 'employee','info' 0 row(s)in 0.4740 seconds => Hbase::Table - employee hbase(main):006:0> put'employee',1,'info:id',1 0 row(s)in 0.2080 seconds hbase(main):008:0> scan'employee' ROW COLUMN+CELL 1 column=info:id,timestamp=1417591291730, value=1 1 row(s)in 0.0610 seconds hbase(main):009:0> put'employee',1,'info:name','peter' 0 row(s)in 0.0220 seconds hbase(main):010:0> scan'employee' ROW COLUMN+CELL 1 column=info:id,timestamp=1417591291730, value=1 1 column=info:name,timestamp=1417591321072, value=peter 1 row(s)in 0.0450 seconds hbase(main):011:0> put'employee',2,'info:id',2 0 row(s)in 0.0370 seconds hbase(main):012:0> put'employee',2,'info:name','paul' 0 row(s)in 0.0180 seconds hbase(main):013:0> scan'employee' ROW COLUMN+CELL 1 column=info:id,timestamp=1417591291730, value=1 1 column=info:name,timestamp=1417591321072, value=peter 2 column=info:id,timestamp=1417591500179, value=2 2 column=info:name,timestamp=1417591512075, value=paul 2 row(s)in 0.0440 seconds

建立Hive外部表

hive 有分为原生表和外部表，原生表是以简单文件方式存储在hdfs里面，外部表依赖别的框架，比如Hbase，我们现在建立一个依赖于我们刚刚建立的employee hbase表的hive 外部表
?

1 2 3 4 5 6 7 8 9 10 11 hive>CREATE EXTERNALTABLE h_employee(key int, idint,name string) > STOREDBY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' >WITH SERDEPROPERTIES ("hbase.columns.mapping" =":key, info:id,info:name") > TBLPROPERTIES ("hbase.table.name" ="employee"); OK Time taken: 0.324 seconds hive>select *from h_employee; OK 1 1 peter 2 2 paul Time taken: 1.129 seconds, Fetched: 2 row(s)

建立Hive原生表

这个hive原生表只是用于导出的时候临时使用的，所以取名叫 h_employee_export，字段之间的分隔符用逗号
?

1 2	`CREATE` `TABLE` `h_employee_export(key` `INT, idINT,name` `STRING)` `ROW FORMAT DELIMITED FIELDS TERMINATEDBY` `'\054';`

我们去看下实际存储的文本文件是什么样子的
?

1 2 3 $ hdfs dfs -cat /user/hive/warehouse/h_employee_export/000000_0 1,1,peter 2,2,paul

源Hive表导入数据到临时表

第一步先将数据从 h_employee(基于Hbase的外部表)导入到 h_employee_export(原生Hive表)
?

1	`hive>insert` `overwritetable` `h_employee_exportselect` `*from` `h_employee;`

1 2 3 4 5 hive>select *from h_employee_export; OK 1 1 peter 2 2 paul Time taken: 0.359 seconds, Fetched: 2 row(s)

我们去看下实际存储的文本文件长什么样子 ?

1 2 3 $ hdfs dfs -cat /user/hive/warehouse/h_employee_export/000000_0 1,1,peter 2,2,paul

从Hive导出数据到mysql

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 $ sqoop export--connect jdbc:mysql://localhost:3306/sqoop_test --username root --password root --table employee --m 1 --export-dir /user/hive/warehouse/h_employee_export/ Warning: /usr/lib/sqoop/../hive-hcatalog doesnot exist! HCatalog jobs will fail. Pleaseset $HCAT_HOMEto the rootof your HCatalog installation. Warning: /usr/lib/sqoop/../accumulo doesnot exist! Accumulo imports will fail. Pleaseset $ACCUMULO_HOMEto the rootof your Accumulo installation. 14/12/05 08:49:35 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.0.1 14/12/05 08:49:35 WARN tool.BaseSqoopTool: Setting yourpassword on the command-lineis insecure. Consider using -Pinstead. 14/12/05 08:49:35 INFO manager.MySQLManager: Preparingto use a MySQL streaming resultset. 14/12/05 08:49:35 INFO tool.CodeGenTool: Beginning code generation 14/12/05 08:49:36 INFO manager.SqlManager: Executing SQL statement:SELECT t.*FROM `employee`AS t LIMIT 1 14/12/05 08:49:36 INFO manager.SqlManager: Executing SQL statement:SELECT t.*FROM `employee`AS t LIMIT 1 14/12/05 08:49:36 INFO orm.CompilationManager: HADOOP_MAPRED_HOMEis /usr/lib/hadoop-mapreduce Note: /tmp/sqoop-wlsuser/compile/d16eb4166baf6a1e885d7df0e2638685/employee.java uses or overrides a deprecated API. Note: Recompilewith -Xlint:deprecationfor details. 14/12/05 08:49:39 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-wlsuser/compile/d16eb4166baf6a1e885d7df0e2638685/employee.jar 14/12/05 08:49:39 INFO mapreduce.ExportJobBase: Beginning exportof employee 14/12/05 08:49:41 INFO Configuration.deprecation: mapred.jaris deprecated.Instead, use mapreduce.job.jar 14/12/05 08:49:43 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.executionis deprecated.Instead, use mapreduce.reduce.speculative 14/12/05 08:49:43 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated.Instead, use mapreduce.map.speculative 14/12/05 08:49:43 INFO Configuration.deprecation: mapred.map.tasksis deprecated.Instead, use mapreduce.job.maps 14/12/05 08:49:43 INFO client.RMProxy: Connectingto ResourceManagerat hadoop01/192.111.78.111:8032 14/12/05 08:49:45 INFO input.FileInputFormat: Total input pathsto process : 1 14/12/05 08:49:45 INFO input.FileInputFormat: Total input pathsto process : 1 14/12/05 08:49:45 INFO mapreduce.JobSubmitter: numberof splits:1 14/12/05 08:49:46 INFO mapreduce.JobSubmitter: Submitting tokensfor job: job_1406097234796_0037 14/12/05 08:49:46 INFO impl.YarnClientImpl: Submitted application application_1406097234796_0037 14/12/05 08:49:46 INFO mapreduce.Job: The urlto track the job: http://hadoop01:8088/proxy/application_1406097234796_0037/ 14/12/05 08:49:46 INFO mapreduce.Job: Running job: job_1406097234796_0037 14/12/05 08:49:59 INFO mapreduce.Job: Job job_1406097234796_0037 runningin uber mode :false 14/12/05 08:49:59 INFO mapreduce.Job: map 0% reduce 0% 14/12/05 08:50:10 INFO mapreduce.Job: map 100% reduce 0% 14/12/05 08:50:10 INFO mapreduce.Job: Job job_1406097234796_0037 completed successfully 14/12/05 08:50:10 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Numberof bytesread=0 FILE: Numberof bytes written=99761 FILE: Numberof read operations=0 FILE: Numberof largeread operations=0 FILE: Numberof write operations=0 HDFS: Numberof bytesread=166 HDFS: Numberof bytes written=0 HDFS: Numberof read operations=4 HDFS: Numberof largeread operations=0 HDFS: Numberof write operations=0 Job Counters Launched map tasks=1 Data-local map tasks=1 Totaltime spentby all mapsin occupied slots (ms)=8805 Totaltime spentby all reducesin occupied slots (ms)=0 Totaltime spentby all map tasks (ms)=8805 Total vcore-seconds takenby all map tasks=8805 Total megabyte-seconds takenby all map tasks=9016320 Map-Reduce Framework Map input records=2 Mapoutput records=2 Input split bytes=144 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GCtime elapsed (ms)=97 CPUtime spent (ms)=1360 Physical memory (bytes) snapshot=167555072 Virtual memory (bytes) snapshot=684212224 Totalcommitted heap usage (bytes)=148897792 File Input Format Counters BytesRead=0 FileOutput Format Counters Bytes Written=0 14/12/05 08:50:10 INFO mapreduce.ExportJobBase: Transferred 166 bytesin 27.0676 seconds (6.1328 bytes/sec) 14/12/05 08:50:10 INFO mapreduce.ExportJobBase: Exported 2 records.

注意
在这段日志中有这样一句话 ?

1	`14/12/05 08:49:46 INFO mapreduce.Job: The urlto` `track the job: http://hadoop01:8088/proxy/application_1406097234796_0037/`

意思是你可以用浏览器访问这个地址去看下任务的执行情况，如果你的任务长时间卡主没结束就是出错了，可以去这个地址查看详细的错误日志

查看结果 ?

1 2 3 4 5 6 7 8 9 10 mysql>select *from employee; +--------+----+-------+ | rowkey | id |name | +--------+----+-------+ | 1 | 1 | peter | | 2 | 2 | paul | +--------+----+-------+ 2rows in set (0.00 sec) mysql>

导入成功