欢迎投稿

今日深度:

Hive调优,

Hive调优,


  • 是否出发mr

    <property>
        <name>hive.fetch.task.conversion</name>
        <value>more</value>
        <description>
          Expects one of [none, minimal, more].
          Some select queries can be converted to single FETCH task minimizing latency.
          Currently the query should be single sourced not having any subquery and should not have
          any aggregations or distincts (which incurs RS), lateral views and joins.
          0. none : disable hive.fetch.task.conversion
          1. minimal : SELECT STAR, FILTER on partition columns, LIMIT only
          2. more    : SELECT, FILTER, LIMIT only (support TABLESAMPLE and virtual columns)
        </description>
      </property>
    

    如果这里参数是minimal的话,对于分区的字段 limit不会执行mr,但是这里需要使用more参数,这样select某些字段时是不会触发mr的

  • 并行化执行
    每个查询被hive转化成多个阶段,有些阶段关联性不大,则可以并行化执行,减少执行时间

    select num from (select count(appid) as num from ext_error_logs union all select count(tenantid) as num from ext_event_logs)
    

    上面这个sql会执行三个job,默认hive.exec.parallel为false时,三个job会按照顺序执行,实测执行了110s。而改为true的话,union all中的两个语句会并行执行,速度快,实测执行了68s。

    <property>
        <name>hive.exec.parallel</name>
        <value>false</value>
        <description>Whether to execute jobs in parallel</description>
      </property>
      <property>
        <name>hive.exec.parallel.thread.number</name>
        <value>8</value>
        <description>How many jobs at most can be executed in parallel</description>
      </property>
    

www.htsjk.Com true http://www.htsjk.com/hive/37686.html NewsArticle Hive调优, 是否出发mr property namehive.fetch.task.conversion/name valuemore/value description Expects one of [none, minimal, more]. Some select queries can be converted to single FETCH task minimizing latency. Currently the query sh...
相关文章
    暂无相关文章
评论暂时关闭