欢迎投稿

今日深度:

hive on tez踩坑记1-hive0.13 on tez,

hive on tez踩坑记1-hive0.13 on tez,


 最近集群准备升级到cdh5.2.0,并使用tez,在测试集群cdh5.2.0已经稳定运行了很长时间,因此开始折腾hive on tez了,期间遇到不少问题,这里记录下。

hive on tez的部署比较简单,可以参考wiki.主要注意几个地方

1.编译的时候

1 mvn clean package -Dtar -DskipTests=true -Dmaven.javadoc.skip=true

2.需要将tez相关的包upload到hdfs中,并设置tez-site.xml

1 2 3 4   <property>     <name>tez.lib.uris</name>     <value>${fs.defaultFS}/tez,${fs.defaultFS}/tez/lib</value>   </property>

设置mapred-site.xml

1 2 3 4   <property>       <name>mapreduce.framework.name</name>       <value>yarn-tez</value>   </property>


3.注意更新hadoop-env.sh中classpath的设置

1 2 3 4 5 6 7 export TEZ_HOME=/home/vipshop/platform/tez for jar in `ls $TEZ_HOME |grep jar`; do     export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$TEZ_HOME/$jar done for jar in `ls $TEZ_HOME/lib`; do     export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$TEZ_HOME/lib/$jar done

否则会报如下错误(加载不到对应的tez相关类,导致Cluster 初始化时失败):

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.         at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)         at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)         at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)         at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1265)         at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1261)         at java.security.AccessController.doPrivileged(Native Method)         at javax.security.auth.Subject.doAs(Subject.java:396)         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)         at org.apache.hadoop.mapreduce.Job.connect(Job.java:1260)         at org.apache.hadoop.mapreduce.Job.submit(Job.java:1289)         at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1313)         at org.apache.hadoop.mapreduce.SleepJob.run(SleepJob.java:261)         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)         at org.apache.hadoop.mapreduce.SleepJob.main(SleepJob.java:194)         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)         at java.lang.reflect.Method.invoke(Method.java:597)         at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)         at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)         at org.apache.hadoop.test.MapredTestDriver.run(MapredTestDriver.java:118)         at org.apache.hadoop.test.MapredTestDriver.main(MapredTestDriver.java:126)         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)         at java.lang.reflect.Method.invoke(Method.java:597)         at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

部署完毕后,使用hadoop jar提交tez job运行正常,测试hive on tez:

1 hive -hiveconf hive.execution.engine=tez -hiveconf hive.root.logger=DEBUG,console

出现如下报错:

1 2 3 4 5 6 7 8 9 10 11 12 Exception in thread "main" java.lang.NoSuchMethodError:  org.apache.tez.mapreduce.hadoop.MRHelpers.updateEnvironmentForMRAM(Lorg/apache/hadoop/conf/Configuration;Ljava/util/Map;)V         at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:182)         at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:123)         at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:355)         at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)         at java.lang.reflect.Method.invoke(Method.java:597)         at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

从堆栈上来看是由于session初始化异常导致,

1 2 org.apache.hadoop.hive.cli.CliDriver.main->org.apache.hadoop.hive.cli.CliDriver.run-> org.apache.hadoop.hive.ql.session.SessionState.start

在SessionState.start方法中:

1 2 3 4 5 6 7 8 9 10 11 12 13  if (HiveConf.getVar(startSs.getConf(), HiveConf.ConfVars.HIVE_EXECUTION_ENGINE)           .equals( "tez") && (startSs.isHiveServerQuery == false)) { //如果设置hive.execution.engine为tez(默认为mr)       try {         if (startSs.tezSessionState == null) {           startSs.tezSessionState = new TezSessionState(startSs.getSessionId()); //实例化一个TezSessionState对象         }         startSs.tezSessionState.open(startSs.conf ); //调用TezSessionState.open方法       catch (Exception e) {         throw new RuntimeException(e);       }     else {       LOG.info( "No Tez session required at this point. hive.execution.engine=mr.");     }

TezSessionState.open中,首先使用createTezDir创建临时文件目录

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33     // create the tez tmp dir     tezScratchDir = createTezDir(sessionId);     String dir = tezScratchDir.toString();     // Localize resources to session scratch dir     localizedResources = utils.localizeTempFilesFromConf(dir, conf); //DagUtils.localizeTempFilesFromConf方法     List<LocalResource> handlerLr = utils.localizeTempFiles(dir, conf, additionalFiles); // DagUtils.localizeTempFiles方法     if (handlerLr != null) {       if (localizedResources == null) {         localizedResources = handlerLr;       else {         localizedResources.addAll(handlerLr);       }       additionalFilesNotFromConf = new HashSet<String>();       for (String originalFile : additionalFiles) {         additionalFilesNotFromConf.add(originalFile);       }     }     // generate basic tez config     TezConfiguration tezConfig = new TezConfiguration(conf); //然后实例化一个TezConfiguration对象     tezConfig.set(TezConfiguration.TEZ_AM_STAGING_DIR, tezScratchDir.toUri().toString()); //设置tez的staging目录,设置项为tez.staging-dir,默认值为/tmp/tez/staging //这里默认最终为"/tmp/hive-" + System. getProperty( "user.name")/_tez_session_dir/sessionId      appJarLr = createJarLocalResource(utils.getExecJarPathLocal()); //localize hive-exec.jar     // configuration for the application master     Map<String, LocalResource> commonLocalResources = new HashMap<String, LocalResource>();     commonLocalResources.put( utils.getBaseName( appJarLr), appJarLr );     if (localizedResources != null) {       for (LocalResource lr : localizedResources) {         commonLocalResources.put( utils.getBaseName(lr), lr);       }     }     // Create environment for AM.     Map<String, String> amEnv = new HashMap<String, String>();     MRHelpers.updateEnvironmentForMRAM(conf, amEnv); //调用MRHelpers类的updateEnvironmentForMRAM方法

对于org.apache.tez.mapreduce.hadoop.MRHelpers类来说,在0.5.0中,这个updateEnvironmentForMRAM方法是不存在的,对应存在updateEnvBasedOnMRTaskEnv(配置Mappers和Reducers的环境变量)和updateEnvBasedOnMRAMEnv(配置am的环境变量)

1 2 3 4 5 6 public static void updateEnvBasedOnMRAMEnv(Configuration conf, Map<String, String> environment) {   TezYARNUtils.appendToEnvFromInputString(environment, conf.get(MRJobConfig.MR_AM_ADMIN_USER_ENV),       File.pathSeparator);   TezYARNUtils.appendToEnvFromInputString(environment, conf.get(MRJobConfig.MR_AM_ENV),       File.pathSeparator); }

而在0.4.1-incubating中是有updateEnvironmentForMRAM这个方法的:

1 2 3 4 5 6 public static void updateEnvironmentForMRAM(Configuration conf, Map<String, String> environment) {   TezYARNUtils.setEnvFromInputString(environment, conf.get(MRJobConfig.MR_AM_ADMIN_USER_ENV),     File.pathSeparator);   TezYARNUtils.setEnvFromInputString(environment, conf.get(MRJobConfig.MR_AM_ENV),     File.pathSeparator); }

对应的hive中:
hive0.13中:

1 2 3     // Create environment for AM.     Map<String, String> amEnv = new HashMap<String, String>();     MRHelpers.updateEnvironmentForMRAM(conf, amEnv);

hive0.14中:

1 2 3     // Create environment for AM.            Map<String, String> amEnv = new HashMap<String, String>();            MRHelpers.updateEnvBasedOnMRAMEnv(conf, amEnv);

可以看到0.4.x到0.5.x版本的tez api变动比较大,0.5.x的tez已经和hive0.13.x不能兼容了,要想使用tez-0.5.x版本,必须使用hive0.14.x版本。         
在github下载hive0.14的源码,编译并测试运行hive on tez:
https://codeload.github.com/apache/hive/zip/branch-0.14

1 mvn clean package -DskipTests -Phadoop-2 -Pdist


本文转自菜菜光 51CTO博客,原文链接:http://blog.51cto.com/caiguangguang/1604087,如需转载请自行联系原作者

www.htsjk.Com true http://www.htsjk.com/hive/39358.html NewsArticle hive on tez踩坑记1-hive0.13 on tez,  最近集群准备升级到cdh5.2.0,并使用tez,在测试集群cdh5.2.0已经稳定运行了很长时间,因此开始折腾hive on tez了,期间遇到不少问题,这里记录下。 hive...
评论暂时关闭