欢迎投稿

今日深度:

hive mapjoin,

hive mapjoin,


hive 版本:1.2.1
在Hive中,common join是很慢的,如果我们是一张大表关联多张小表,可以使用mapjoin加快速度。
mapjoin主要有以下参数:

hive.auto.convert.join : 是否自动转换为mapjoin
hive.mapjoin.smalltable.filesize : 小表的最大文件大小,默认为25000000,即25M
hive.auto.convert.join.noconditionaltask : 是否将多个mapjoin合并为一个
hive.auto.convert.join.noconditionaltask.size : 多个mapjoin转换为1个时,所有小表的文件大小总和的最大值。

必须添加hive.auto.convert.join=true,否则mapjoin不生效。
mapjoin中加载到内存的表最好物理上就是一张表,嵌套union all 生成的子查询有可能不走mapjoin。

样例1:小表为union all 嵌套子查询,执行计划走mapjoin.

hive> 
    > set mapreduce.map.memory.mb=4096;
hive> 
    > set hive.auto.convert.join=true;
hive> explain
    > select  
    >         ome.oid,ome.path,ome.server_time,ome.cookie,ome.product_name,ome.client_time,ome.pageid,ome.label,
    >         ome.url,ome.referrer,ome.url_path,ome.referrer_path,ome.js_version,ome.user_name,ome.page_title,ome.tel,
    >         ome.app_name,ome.app_version,ome.os_type,ome.os_version,ome.brand,ome.model,ot.app_name,clientid,ome.cip,ome.ua,ome.country,ome.province,ome.city  
    >   from
    >         (select oid,eventid,server_time,cookie,product_name,client_time,pageid,label,
    >         url,referrer,url_path,parse_url(url_path,'PATH') path,parse_url(url_path,'HOST') uri,referrer_path,js_version,user_name,page_title,tel,
    >         app_name,app_version,os_type,os_version,brand,model,clientid,cip,ua,country,province,city 
    >         from omega.dwd_web where year='2016' and month='07' and day='26') ome 
    >    join (select uri,max(app_name) as app_name
    >            from        
    >                 (select parse_url(redirect_uri,'HOST') uri,app_name from ods_open_t_open_app  where dt='2016-07-26'  
    >                  union all 
    >                  select parse_url(callback_uri,'HOST') uri,app_name  from ods_open_t_open_app  where dt='2016-07-26') a
    >           group by uri) ot
    >    on ot.uri=ome.uri;
OK
Explain
STAGE DEPENDENCIES:
  Stage-2 is a root stage
  Stage-6 depends on stages: Stage-2 , consists of Stage-7, Stage-8, Stage-1
  Stage-7 has a backup stage: Stage-1
  Stage-4 depends on stages: Stage-7
  Stage-8 has a backup stage: Stage-1
  Stage-5 depends on stages: Stage-8
  Stage-1
  Stage-0 depends on stages: Stage-4, Stage-5, Stage-1

STAGE PLANS:
  Stage: Stage-2
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: ods_open_t_open_app
            Statistics: Num rows: 1 Data size: 239 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: parse_url(redirect_uri, 'HOST') (type: string), app_name (type: string)
              outputColumnNames: _col0, _col1
              Statistics: Num rows: 1 Data size: 239 Basic stats: COMPLETE Column stats: NONE
              Filter Operator
                predicate: _col0 is not null (type: boolean)
                Statistics: Num rows: 1 Data size: 239 Basic stats: COMPLETE Column stats: NONE
                Union
                  Statistics: Num rows: 2 Data size: 478 Basic stats: COMPLETE Column stats: NONE
                  Group By Operator
                    aggregations: max(_col1)
                    keys: _col0 (type: string)
                    mode: hash
                    outputColumnNames: _col0, _col1
                    Statistics: Num rows: 2 Data size: 478 Basic stats: COMPLETE Column stats: NONE
                    Reduce Output Operator
                      key expressions: _col0 (type: string)
                      sort order: +
                      Map-reduce partition columns: _col0 (type: string)
                      Statistics: Num rows: 2 Data size: 478 Basic stats: COMPLETE Column stats: NONE
                      value expressions: _col1 (type: string)
          TableScan
            alias: ods_open_t_open_app
            Statistics: Num rows: 1 Data size: 239 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: parse_url(callback_uri, 'HOST') (type: string), app_name (type: string)
              outputColumnNames: _col0, _col1
              Statistics: Num rows: 1 Data size: 239 Basic stats: COMPLETE Column stats: NONE
              Filter Operator
                predicate: _col0 is not null (type: boolean)
                Statistics: Num rows: 1 Data size: 239 Basic stats: COMPLETE Column stats: NONE
                Union
                  Statistics: Num rows: 2 Data size: 478 Basic stats: COMPLETE Column stats: NONE
                  Group By Operator
                    aggregations: max(_col1)
                    keys: _col0 (type: string)
                    mode: hash
                    outputColumnNames: _col0, _col1
                    Statistics: Num rows: 2 Data size: 478 Basic stats: COMPLETE Column stats: NONE
                    Reduce Output Operator
                      key expressions: _col0 (type: string)
                      sort order: +
                      Map-reduce partition columns: _col0 (type: string)
                      Statistics: Num rows: 2 Data size: 478 Basic stats: COMPLETE Column stats: NONE
                      value expressions: _col1 (type: string)
      Reduce Operator Tree:
        Group By Operator
          aggregations: max(VALUE._col0)
          keys: KEY._col0 (type: string)
          mode: mergepartial
          outputColumnNames: _col0, _col1
          Statistics: Num rows: 1 Data size: 239 Basic stats: COMPLETE Column stats: NONE
          File Output Operator
            compressed: true
            table:
                input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe

  Stage: Stage-6
    Conditional Operator

  Stage: Stage-7
    Map Reduce Local Work
      Alias -> Map Local Tables:
        $INTNAME 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        $INTNAME 
          TableScan
            HashTable Sink Operator
              keys:
                0 _col12 (type: string)
                1 _col0 (type: string)

  Stage: Stage-4
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: dwd_web
            Statistics: Num rows: 121543412 Data size: 282953087847 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: oid (type: string), server_time (type: bigint), cookie (type: int), product_name (type: string), client_time (type: bigint), pageid (type: string), label (type: string), url (type: string), referrer (type: string), url_path (type: string), parse_url(url_path, 'PATH') (type: string), parse_url(url_path, 'HOST') (type: string), referrer_path (type: string), js_version (type: string), user_name (type: string), page_title (type: string), tel (type: string), app_name (type: string), app_version (type: string), os_type (type: string), os_version (type: string), brand (type: string), model (type: string), clientid (type: bigint), cip (type: string), ua (type: string), country (type: string), province (type: string), city (type: string)
              outputColumnNames: _col0, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29
              Statistics: Num rows: 121543412 Data size: 282953087847 Basic stats: COMPLETE Column stats: NONE
              Filter Operator
                predicate: _col12 is not null (type: boolean)
                Statistics: Num rows: 60771706 Data size: 141476543923 Basic stats: COMPLETE Column stats: NONE
                Map Join Operator
                  condition map:
                       Inner Join 0 to 1
                  keys:
                    0 _col12 (type: string)
                    1 _col0 (type: string)
                  outputColumnNames: _col0, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29, _col31
                  Statistics: Num rows: 66848878 Data size: 155624201688 Basic stats: COMPLETE Column stats: NONE
                  Select Operator
                    expressions: _col0 (type: string), _col11 (type: string), _col2 (type: bigint), _col3 (type: int), _col4 (type: string), _col5 (type: bigint), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: string), _col18 (type: string), _col19 (type: string), _col20 (type: string), _col21 (type: string), _col22 (type: string), _col23 (type: string), _col31 (type: string), _col24 (type: bigint), _col25 (type: string), _col26 (type: string), _col27 (type: string), _col28 (type: string), _col29 (type: string)
                    outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28
                    Statistics: Num rows: 66848878 Data size: 155624201688 Basic stats: COMPLETE Column stats: NONE
                    File Output Operator
                      compressed: false
                      Statistics: Num rows: 66848878 Data size: 155624201688 Basic stats: COMPLETE Column stats: NONE
                      table:
                          input format: org.apache.hadoop.mapred.TextInputFormat
                          output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                          serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-8
    Map Reduce Local Work
      Alias -> Map Local Tables:
        ome:dwd_web 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        ome:dwd_web 
          TableScan
            alias: dwd_web
            Statistics: Num rows: 121543412 Data size: 282953087847 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: oid (type: string), server_time (type: bigint), cookie (type: int), product_name (type: string), client_time (type: bigint), pageid (type: string), label (type: string), url (type: string), referrer (type: string), url_path (type: string), parse_url(url_path, 'PATH') (type: string), parse_url(url_path, 'HOST') (type: string), referrer_path (type: string), js_version (type: string), user_name (type: string), page_title (type: string), tel (type: string), app_name (type: string), app_version (type: string), os_type (type: string), os_version (type: string), brand (type: string), model (type: string), clientid (type: bigint), cip (type: string), ua (type: string), country (type: string), province (type: string), city (type: string)
              outputColumnNames: _col0, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29
              Statistics: Num rows: 121543412 Data size: 282953087847 Basic stats: COMPLETE Column stats: NONE
              Filter Operator
                predicate: _col12 is not null (type: boolean)
                Statistics: Num rows: 60771706 Data size: 141476543923 Basic stats: COMPLETE Column stats: NONE
                HashTable Sink Operator
                  keys:
                    0 _col12 (type: string)
                    1 _col0 (type: string)

  Stage: Stage-5
    Map Reduce
      Map Operator Tree:
          TableScan
            Map Join Operator
              condition map:
                   Inner Join 0 to 1
              keys:
                0 _col12 (type: string)
                1 _col0 (type: string)
              outputColumnNames: _col0, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29, _col31
              Statistics: Num rows: 66848878 Data size: 155624201688 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: _col0 (type: string), _col11 (type: string), _col2 (type: bigint), _col3 (type: int), _col4 (type: string), _col5 (type: bigint), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: string), _col18 (type: string), _col19 (type: string), _col20 (type: string), _col21 (type: string), _col22 (type: string), _col23 (type: string), _col31 (type: string), _col24 (type: bigint), _col25 (type: string), _col26 (type: string), _col27 (type: string), _col28 (type: string), _col29 (type: string)
                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28
                Statistics: Num rows: 66848878 Data size: 155624201688 Basic stats: COMPLETE Column stats: NONE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 66848878 Data size: 155624201688 Basic stats: COMPLETE Column stats: NONE
                  table:
                      input format: org.apache.hadoop.mapred.TextInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: dwd_web
            Statistics: Num rows: 121543412 Data size: 282953087847 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: oid (type: string), server_time (type: bigint), cookie (type: int), product_name (type: string), client_time (type: bigint), pageid (type: string), label (type: string), url (type: string), referrer (type: string), url_path (type: string), parse_url(url_path, 'PATH') (type: string), parse_url(url_path, 'HOST') (type: string), referrer_path (type: string), js_version (type: string), user_name (type: string), page_title (type: string), tel (type: string), app_name (type: string), app_version (type: string), os_type (type: string), os_version (type: string), brand (type: string), model (type: string), clientid (type: bigint), cip (type: string), ua (type: string), country (type: string), province (type: string), city (type: string)
              outputColumnNames: _col0, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29
              Statistics: Num rows: 121543412 Data size: 282953087847 Basic stats: COMPLETE Column stats: NONE
              Filter Operator
                predicate: _col12 is not null (type: boolean)
                Statistics: Num rows: 60771706 Data size: 141476543923 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  key expressions: _col12 (type: string)
                  sort order: +
                  Map-reduce partition columns: _col12 (type: string)
                  Statistics: Num rows: 60771706 Data size: 141476543923 Basic stats: COMPLETE Column stats: NONE
                  value expressions: _col0 (type: string), _col2 (type: bigint), _col3 (type: int), _col4 (type: string), _col5 (type: bigint), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col11 (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: string), _col18 (type: string), _col19 (type: string), _col20 (type: string), _col21 (type: string), _col22 (type: string), _col23 (type: string), _col24 (type: bigint), _col25 (type: string), _col26 (type: string), _col27 (type: string), _col28 (type: string), _col29 (type: string)
          TableScan
            Reduce Output Operator
              key expressions: _col0 (type: string)
              sort order: +
              Map-reduce partition columns: _col0 (type: string)
              Statistics: Num rows: 1 Data size: 239 Basic stats: COMPLETE Column stats: NONE
              value expressions: _col1 (type: string)
      Reduce Operator Tree:
        Join Operator
          condition map:
               Inner Join 0 to 1
          keys:
            0 _col12 (type: string)
            1 _col0 (type: string)
          outputColumnNames: _col0, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29, _col31
          Statistics: Num rows: 66848878 Data size: 155624201688 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: _col0 (type: string), _col11 (type: string), _col2 (type: bigint), _col3 (type: int), _col4 (type: string), _col5 (type: bigint), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: string), _col18 (type: string), _col19 (type: string), _col20 (type: string), _col21 (type: string), _col22 (type: string), _col23 (type: string), _col31 (type: string), _col24 (type: bigint), _col25 (type: string), _col26 (type: string), _col27 (type: string), _col28 (type: string), _col29 (type: string)
            outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28
            Statistics: Num rows: 66848878 Data size: 155624201688 Basic stats: COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
              Statistics: Num rows: 66848878 Data size: 155624201688 Basic stats: COMPLETE Column stats: NONE
              table:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

样例2:没有加hive.auto.convert.join=true参数,明确mapjoin,执行不会走mapjoin

hive> use ecom;
OK
Time taken: 0.625 seconds
hive> 
    > set mapreduce.map.memory.mb=4096;
hive> explain
    > select  /*+mapjoin(ot)*/
    >         ome.oid,ome.path,ome.server_time,ome.cookie,ome.product_name,ome.client_time,ome.pageid,ome.label,
    >         ome.url,ome.referrer,ome.url_path,ome.referrer_path,ome.js_version,ome.user_name,ome.page_title,ome.tel,
    >         ome.app_name,ome.app_version,ome.os_type,ome.os_version,ome.brand,ome.model,ot.app_name,clientid,ome.cip,ome.ua,ome.country,ome.province,ome.city  
    >   from
    >         (select oid,eventid,server_time,cookie,product_name,client_time,pageid,label,
    >         url,referrer,url_path,parse_url(url_path,'PATH') path,parse_url(url_path,'HOST') uri,referrer_path,js_version,user_name,page_title,tel,
    >         app_name,app_version,os_type,os_version,brand,model,clientid,cip,ua,country,province,city 
    >         from omega.dwd_web where year='2016' and month='07' and day='26') ome 
    >    join tmp_zyh_1  as ot
    >    on ot.uri=ome.uri;
OK
Explain
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: dwd_web
            Statistics: Num rows: 121543412 Data size: 282953087847 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: oid (type: string), server_time (type: bigint), cookie (type: int), product_name (type: string), client_time (type: bigint), pageid (type: string), label (type: string), url (type: string), referrer (type: string), url_path (type: string), parse_url(url_path, 'PATH') (type: string), parse_url(url_path, 'HOST') (type: string), referrer_path (type: string), js_version (type: string), user_name (type: string), page_title (type: string), tel (type: string), app_name (type: string), app_version (type: string), os_type (type: string), os_version (type: string), brand (type: string), model (type: string), clientid (type: bigint), cip (type: string), ua (type: string), country (type: string), province (type: string), city (type: string)
              outputColumnNames: _col0, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29
              Statistics: Num rows: 121543412 Data size: 282953087847 Basic stats: COMPLETE Column stats: NONE
              Filter Operator
                predicate: _col12 is not null (type: boolean)
                Statistics: Num rows: 60771706 Data size: 141476543923 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  key expressions: _col12 (type: string)
                  sort order: +
                  Map-reduce partition columns: _col12 (type: string)
                  Statistics: Num rows: 60771706 Data size: 141476543923 Basic stats: COMPLETE Column stats: NONE
                  value expressions: _col0 (type: string), _col2 (type: bigint), _col3 (type: int), _col4 (type: string), _col5 (type: bigint), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col11 (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: string), _col18 (type: string), _col19 (type: string), _col20 (type: string), _col21 (type: string), _col22 (type: string), _col23 (type: string), _col24 (type: bigint), _col25 (type: string), _col26 (type: string), _col27 (type: string), _col28 (type: string), _col29 (type: string)
          TableScan
            alias: ot
            Statistics: Num rows: 2 Data size: 88 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: uri is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
              Reduce Output Operator
                key expressions: uri (type: string)
                sort order: +
                Map-reduce partition columns: uri (type: string)
                Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
                value expressions: app_name (type: string)
      Reduce Operator Tree:
        Join Operator
          condition map:
               Inner Join 0 to 1
          keys:
            0 _col12 (type: string)
            1 uri (type: string)
          outputColumnNames: _col0, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29, _col31
          Statistics: Num rows: 66848878 Data size: 155624201688 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: _col0 (type: string), _col11 (type: string), _col2 (type: bigint), _col3 (type: int), _col4 (type: string), _col5 (type: bigint), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: string), _col18 (type: string), _col19 (type: string), _col20 (type: string), _col21 (type: string), _col22 (type: string), _col23 (type: string), _col31 (type: string), _col24 (type: bigint), _col25 (type: string), _col26 (type: string), _col27 (type: string), _col28 (type: string), _col29 (type: string)
            outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28
            Statistics: Num rows: 66848878 Data size: 155624201688 Basic stats: COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
              Statistics: Num rows: 66848878 Data size: 155624201688 Basic stats: COMPLETE Column stats: NONE
              table:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

样例3:添加hive.auto.convert.join=true参数,不明确mapjoin,执行走mapjoin

hive> use ecom;
OK
Time taken: 0.616 seconds
hive> 
    > set mapreduce.map.memory.mb=4096;
hive> set hive.auto.convert.join=true;
hive> explain
    > 
    > select 
    >         ome.oid,ome.path,ome.server_time,ome.cookie,ome.product_name,ome.client_time,ome.pageid,ome.label,
    >         ome.url,ome.referrer,ome.url_path,ome.referrer_path,ome.js_version,ome.user_name,ome.page_title,ome.tel,
    >         ome.app_name,ome.app_version,ome.os_type,ome.os_version,ome.brand,ome.model,ot.app_name,clientid,ome.cip,ome.ua,ome.country,ome.province,ome.city  
    >   from
    >         (select oid,eventid,server_time,cookie,product_name,client_time,pageid,label,
    >         url,referrer,url_path,parse_url(url_path,'PATH') path,parse_url(url_path,'HOST') uri,referrer_path,js_version,user_name,page_title,tel,
    >         app_name,app_version,os_type,os_version,brand,model,clientid,cip,ua,country,province,city 
    >         from omega.dwd_web where year='2016' and month='07' and day='26') ome 
    >    join tmp_zyh_1  as ot
    >    on ot.uri=ome.uri;
OK
Explain
STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        ot 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        ot 
          TableScan
            alias: ot
            Statistics: Num rows: 2 Data size: 88 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: uri is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                keys:
                  0 _col12 (type: string)
                  1 uri (type: string)

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: dwd_web
            Statistics: Num rows: 121543412 Data size: 282953087847 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: oid (type: string), server_time (type: bigint), cookie (type: int), product_name (type: string), client_time (type: bigint), pageid (type: string), label (type: string), url (type: string), referrer (type: string), url_path (type: string), parse_url(url_path, 'PATH') (type: string), parse_url(url_path, 'HOST') (type: string), referrer_path (type: string), js_version (type: string), user_name (type: string), page_title (type: string), tel (type: string), app_name (type: string), app_version (type: string), os_type (type: string), os_version (type: string), brand (type: string), model (type: string), clientid (type: bigint), cip (type: string), ua (type: string), country (type: string), province (type: string), city (type: string)
              outputColumnNames: _col0, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29
              Statistics: Num rows: 121543412 Data size: 282953087847 Basic stats: COMPLETE Column stats: NONE
              Filter Operator
                predicate: _col12 is not null (type: boolean)
                Statistics: Num rows: 60771706 Data size: 141476543923 Basic stats: COMPLETE Column stats: NONE
                Map Join Operator
                  condition map:
                       Inner Join 0 to 1
                  keys:
                    0 _col12 (type: string)
                    1 uri (type: string)
                  outputColumnNames: _col0, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29, _col31
                  Statistics: Num rows: 66848878 Data size: 155624201688 Basic stats: COMPLETE Column stats: NONE
                  Select Operator
                    expressions: _col0 (type: string), _col11 (type: string), _col2 (type: bigint), _col3 (type: int), _col4 (type: string), _col5 (type: bigint), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: string), _col18 (type: string), _col19 (type: string), _col20 (type: string), _col21 (type: string), _col22 (type: string), _col23 (type: string), _col31 (type: string), _col24 (type: bigint), _col25 (type: string), _col26 (type: string), _col27 (type: string), _col28 (type: string), _col29 (type: string)
                    outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28
                    Statistics: Num rows: 66848878 Data size: 155624201688 Basic stats: COMPLETE Column stats: NONE
                    File Output Operator
                      compressed: false
                      Statistics: Num rows: 66848878 Data size: 155624201688 Basic stats: COMPLETE Column stats: NONE
                      table:
                          input format: org.apache.hadoop.mapred.TextInputFormat
                          output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                          serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

www.htsjk.Com true http://www.htsjk.com/hive/40222.html NewsArticle hive mapjoin, hive 版本:1.2.1 在Hive中,common join是很慢的,如果我们是一张大表关联多张小表,可以使用mapjoin加快速度。 mapjoin主要有以下参数: hive.auto.convert.join : 是否自动转换为ma...
相关文章
    暂无相关文章
评论暂时关闭