hive mapjoin,
hive 版本:1.2.1
在Hive中,common join是很慢的,如果我们是一张大表关联多张小表,可以使用mapjoin加快速度。
mapjoin主要有以下参数:
hive.auto.convert.join : 是否自动转换为mapjoin
hive.mapjoin.smalltable.filesize : 小表的最大文件大小,默认为25000000,即25M
hive.auto.convert.join.noconditionaltask : 是否将多个mapjoin合并为一个
hive.auto.convert.join.noconditionaltask.size : 多个mapjoin转换为1个时,所有小表的文件大小总和的最大值。
必须添加hive.auto.convert.join=true,否则mapjoin不生效。
mapjoin中加载到内存的表最好物理上就是一张表,嵌套union all 生成的子查询有可能不走mapjoin。
样例1:小表为union all 嵌套子查询,执行计划走mapjoin.
hive>
> set mapreduce.map.memory.mb=4096;
hive>
> set hive.auto.convert.join=true;
hive> explain
> select
> ome.oid,ome.path,ome.server_time,ome.cookie,ome.product_name,ome.client_time,ome.pageid,ome.label,
> ome.url,ome.referrer,ome.url_path,ome.referrer_path,ome.js_version,ome.user_name,ome.page_title,ome.tel,
> ome.app_name,ome.app_version,ome.os_type,ome.os_version,ome.brand,ome.model,ot.app_name,clientid,ome.cip,ome.ua,ome.country,ome.province,ome.city
> from
> (select oid,eventid,server_time,cookie,product_name,client_time,pageid,label,
> url,referrer,url_path,parse_url(url_path,'PATH') path,parse_url(url_path,'HOST') uri,referrer_path,js_version,user_name,page_title,tel,
> app_name,app_version,os_type,os_version,brand,model,clientid,cip,ua,country,province,city
> from omega.dwd_web where year='2016' and month='07' and day='26') ome
> join (select uri,max(app_name) as app_name
> from
> (select parse_url(redirect_uri,'HOST') uri,app_name from ods_open_t_open_app where dt='2016-07-26'
> union all
> select parse_url(callback_uri,'HOST') uri,app_name from ods_open_t_open_app where dt='2016-07-26') a
> group by uri) ot
> on ot.uri=ome.uri;
OK
Explain
STAGE DEPENDENCIES:
Stage-2 is a root stage
Stage-6 depends on stages: Stage-2 , consists of Stage-7, Stage-8, Stage-1
Stage-7 has a backup stage: Stage-1
Stage-4 depends on stages: Stage-7
Stage-8 has a backup stage: Stage-1
Stage-5 depends on stages: Stage-8
Stage-1
Stage-0 depends on stages: Stage-4, Stage-5, Stage-1
STAGE PLANS:
Stage: Stage-2
Map Reduce
Map Operator Tree:
TableScan
alias: ods_open_t_open_app
Statistics: Num rows: 1 Data size: 239 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: parse_url(redirect_uri, 'HOST') (type: string), app_name (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 239 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: _col0 is not null (type: boolean)
Statistics: Num rows: 1 Data size: 239 Basic stats: COMPLETE Column stats: NONE
Union
Statistics: Num rows: 2 Data size: 478 Basic stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: max(_col1)
keys: _col0 (type: string)
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 2 Data size: 478 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 2 Data size: 478 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: string)
TableScan
alias: ods_open_t_open_app
Statistics: Num rows: 1 Data size: 239 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: parse_url(callback_uri, 'HOST') (type: string), app_name (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 239 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: _col0 is not null (type: boolean)
Statistics: Num rows: 1 Data size: 239 Basic stats: COMPLETE Column stats: NONE
Union
Statistics: Num rows: 2 Data size: 478 Basic stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: max(_col1)
keys: _col0 (type: string)
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 2 Data size: 478 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 2 Data size: 478 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: string)
Reduce Operator Tree:
Group By Operator
aggregations: max(VALUE._col0)
keys: KEY._col0 (type: string)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 239 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: true
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
Stage: Stage-6
Conditional Operator
Stage: Stage-7
Map Reduce Local Work
Alias -> Map Local Tables:
$INTNAME
Fetch Operator
limit: -1
Alias -> Map Local Operator Tree:
$INTNAME
TableScan
HashTable Sink Operator
keys:
0 _col12 (type: string)
1 _col0 (type: string)
Stage: Stage-4
Map Reduce
Map Operator Tree:
TableScan
alias: dwd_web
Statistics: Num rows: 121543412 Data size: 282953087847 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: oid (type: string), server_time (type: bigint), cookie (type: int), product_name (type: string), client_time (type: bigint), pageid (type: string), label (type: string), url (type: string), referrer (type: string), url_path (type: string), parse_url(url_path, 'PATH') (type: string), parse_url(url_path, 'HOST') (type: string), referrer_path (type: string), js_version (type: string), user_name (type: string), page_title (type: string), tel (type: string), app_name (type: string), app_version (type: string), os_type (type: string), os_version (type: string), brand (type: string), model (type: string), clientid (type: bigint), cip (type: string), ua (type: string), country (type: string), province (type: string), city (type: string)
outputColumnNames: _col0, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29
Statistics: Num rows: 121543412 Data size: 282953087847 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: _col12 is not null (type: boolean)
Statistics: Num rows: 60771706 Data size: 141476543923 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col12 (type: string)
1 _col0 (type: string)
outputColumnNames: _col0, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29, _col31
Statistics: Num rows: 66848878 Data size: 155624201688 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: string), _col11 (type: string), _col2 (type: bigint), _col3 (type: int), _col4 (type: string), _col5 (type: bigint), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: string), _col18 (type: string), _col19 (type: string), _col20 (type: string), _col21 (type: string), _col22 (type: string), _col23 (type: string), _col31 (type: string), _col24 (type: bigint), _col25 (type: string), _col26 (type: string), _col27 (type: string), _col28 (type: string), _col29 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28
Statistics: Num rows: 66848878 Data size: 155624201688 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 66848878 Data size: 155624201688 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Local Work:
Map Reduce Local Work
Stage: Stage-8
Map Reduce Local Work
Alias -> Map Local Tables:
ome:dwd_web
Fetch Operator
limit: -1
Alias -> Map Local Operator Tree:
ome:dwd_web
TableScan
alias: dwd_web
Statistics: Num rows: 121543412 Data size: 282953087847 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: oid (type: string), server_time (type: bigint), cookie (type: int), product_name (type: string), client_time (type: bigint), pageid (type: string), label (type: string), url (type: string), referrer (type: string), url_path (type: string), parse_url(url_path, 'PATH') (type: string), parse_url(url_path, 'HOST') (type: string), referrer_path (type: string), js_version (type: string), user_name (type: string), page_title (type: string), tel (type: string), app_name (type: string), app_version (type: string), os_type (type: string), os_version (type: string), brand (type: string), model (type: string), clientid (type: bigint), cip (type: string), ua (type: string), country (type: string), province (type: string), city (type: string)
outputColumnNames: _col0, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29
Statistics: Num rows: 121543412 Data size: 282953087847 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: _col12 is not null (type: boolean)
Statistics: Num rows: 60771706 Data size: 141476543923 Basic stats: COMPLETE Column stats: NONE
HashTable Sink Operator
keys:
0 _col12 (type: string)
1 _col0 (type: string)
Stage: Stage-5
Map Reduce
Map Operator Tree:
TableScan
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col12 (type: string)
1 _col0 (type: string)
outputColumnNames: _col0, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29, _col31
Statistics: Num rows: 66848878 Data size: 155624201688 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: string), _col11 (type: string), _col2 (type: bigint), _col3 (type: int), _col4 (type: string), _col5 (type: bigint), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: string), _col18 (type: string), _col19 (type: string), _col20 (type: string), _col21 (type: string), _col22 (type: string), _col23 (type: string), _col31 (type: string), _col24 (type: bigint), _col25 (type: string), _col26 (type: string), _col27 (type: string), _col28 (type: string), _col29 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28
Statistics: Num rows: 66848878 Data size: 155624201688 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 66848878 Data size: 155624201688 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Local Work:
Map Reduce Local Work
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: dwd_web
Statistics: Num rows: 121543412 Data size: 282953087847 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: oid (type: string), server_time (type: bigint), cookie (type: int), product_name (type: string), client_time (type: bigint), pageid (type: string), label (type: string), url (type: string), referrer (type: string), url_path (type: string), parse_url(url_path, 'PATH') (type: string), parse_url(url_path, 'HOST') (type: string), referrer_path (type: string), js_version (type: string), user_name (type: string), page_title (type: string), tel (type: string), app_name (type: string), app_version (type: string), os_type (type: string), os_version (type: string), brand (type: string), model (type: string), clientid (type: bigint), cip (type: string), ua (type: string), country (type: string), province (type: string), city (type: string)
outputColumnNames: _col0, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29
Statistics: Num rows: 121543412 Data size: 282953087847 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: _col12 is not null (type: boolean)
Statistics: Num rows: 60771706 Data size: 141476543923 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col12 (type: string)
sort order: +
Map-reduce partition columns: _col12 (type: string)
Statistics: Num rows: 60771706 Data size: 141476543923 Basic stats: COMPLETE Column stats: NONE
value expressions: _col0 (type: string), _col2 (type: bigint), _col3 (type: int), _col4 (type: string), _col5 (type: bigint), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col11 (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: string), _col18 (type: string), _col19 (type: string), _col20 (type: string), _col21 (type: string), _col22 (type: string), _col23 (type: string), _col24 (type: bigint), _col25 (type: string), _col26 (type: string), _col27 (type: string), _col28 (type: string), _col29 (type: string)
TableScan
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 1 Data size: 239 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: string)
Reduce Operator Tree:
Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col12 (type: string)
1 _col0 (type: string)
outputColumnNames: _col0, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29, _col31
Statistics: Num rows: 66848878 Data size: 155624201688 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: string), _col11 (type: string), _col2 (type: bigint), _col3 (type: int), _col4 (type: string), _col5 (type: bigint), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: string), _col18 (type: string), _col19 (type: string), _col20 (type: string), _col21 (type: string), _col22 (type: string), _col23 (type: string), _col31 (type: string), _col24 (type: bigint), _col25 (type: string), _col26 (type: string), _col27 (type: string), _col28 (type: string), _col29 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28
Statistics: Num rows: 66848878 Data size: 155624201688 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 66848878 Data size: 155624201688 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
样例2:没有加hive.auto.convert.join=true参数,明确mapjoin,执行不会走mapjoin
hive> use ecom;
OK
Time taken: 0.625 seconds
hive>
> set mapreduce.map.memory.mb=4096;
hive> explain
> select /*+mapjoin(ot)*/
> ome.oid,ome.path,ome.server_time,ome.cookie,ome.product_name,ome.client_time,ome.pageid,ome.label,
> ome.url,ome.referrer,ome.url_path,ome.referrer_path,ome.js_version,ome.user_name,ome.page_title,ome.tel,
> ome.app_name,ome.app_version,ome.os_type,ome.os_version,ome.brand,ome.model,ot.app_name,clientid,ome.cip,ome.ua,ome.country,ome.province,ome.city
> from
> (select oid,eventid,server_time,cookie,product_name,client_time,pageid,label,
> url,referrer,url_path,parse_url(url_path,'PATH') path,parse_url(url_path,'HOST') uri,referrer_path,js_version,user_name,page_title,tel,
> app_name,app_version,os_type,os_version,brand,model,clientid,cip,ua,country,province,city
> from omega.dwd_web where year='2016' and month='07' and day='26') ome
> join tmp_zyh_1 as ot
> on ot.uri=ome.uri;
OK
Explain
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: dwd_web
Statistics: Num rows: 121543412 Data size: 282953087847 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: oid (type: string), server_time (type: bigint), cookie (type: int), product_name (type: string), client_time (type: bigint), pageid (type: string), label (type: string), url (type: string), referrer (type: string), url_path (type: string), parse_url(url_path, 'PATH') (type: string), parse_url(url_path, 'HOST') (type: string), referrer_path (type: string), js_version (type: string), user_name (type: string), page_title (type: string), tel (type: string), app_name (type: string), app_version (type: string), os_type (type: string), os_version (type: string), brand (type: string), model (type: string), clientid (type: bigint), cip (type: string), ua (type: string), country (type: string), province (type: string), city (type: string)
outputColumnNames: _col0, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29
Statistics: Num rows: 121543412 Data size: 282953087847 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: _col12 is not null (type: boolean)
Statistics: Num rows: 60771706 Data size: 141476543923 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col12 (type: string)
sort order: +
Map-reduce partition columns: _col12 (type: string)
Statistics: Num rows: 60771706 Data size: 141476543923 Basic stats: COMPLETE Column stats: NONE
value expressions: _col0 (type: string), _col2 (type: bigint), _col3 (type: int), _col4 (type: string), _col5 (type: bigint), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col11 (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: string), _col18 (type: string), _col19 (type: string), _col20 (type: string), _col21 (type: string), _col22 (type: string), _col23 (type: string), _col24 (type: bigint), _col25 (type: string), _col26 (type: string), _col27 (type: string), _col28 (type: string), _col29 (type: string)
TableScan
alias: ot
Statistics: Num rows: 2 Data size: 88 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: uri is not null (type: boolean)
Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: uri (type: string)
sort order: +
Map-reduce partition columns: uri (type: string)
Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
value expressions: app_name (type: string)
Reduce Operator Tree:
Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col12 (type: string)
1 uri (type: string)
outputColumnNames: _col0, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29, _col31
Statistics: Num rows: 66848878 Data size: 155624201688 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: string), _col11 (type: string), _col2 (type: bigint), _col3 (type: int), _col4 (type: string), _col5 (type: bigint), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: string), _col18 (type: string), _col19 (type: string), _col20 (type: string), _col21 (type: string), _col22 (type: string), _col23 (type: string), _col31 (type: string), _col24 (type: bigint), _col25 (type: string), _col26 (type: string), _col27 (type: string), _col28 (type: string), _col29 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28
Statistics: Num rows: 66848878 Data size: 155624201688 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 66848878 Data size: 155624201688 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
样例3:添加hive.auto.convert.join=true参数,不明确mapjoin,执行走mapjoin
hive> use ecom;
OK
Time taken: 0.616 seconds
hive>
> set mapreduce.map.memory.mb=4096;
hive> set hive.auto.convert.join=true;
hive> explain
>
> select
> ome.oid,ome.path,ome.server_time,ome.cookie,ome.product_name,ome.client_time,ome.pageid,ome.label,
> ome.url,ome.referrer,ome.url_path,ome.referrer_path,ome.js_version,ome.user_name,ome.page_title,ome.tel,
> ome.app_name,ome.app_version,ome.os_type,ome.os_version,ome.brand,ome.model,ot.app_name,clientid,ome.cip,ome.ua,ome.country,ome.province,ome.city
> from
> (select oid,eventid,server_time,cookie,product_name,client_time,pageid,label,
> url,referrer,url_path,parse_url(url_path,'PATH') path,parse_url(url_path,'HOST') uri,referrer_path,js_version,user_name,page_title,tel,
> app_name,app_version,os_type,os_version,brand,model,clientid,cip,ua,country,province,city
> from omega.dwd_web where year='2016' and month='07' and day='26') ome
> join tmp_zyh_1 as ot
> on ot.uri=ome.uri;
OK
Explain
STAGE DEPENDENCIES:
Stage-4 is a root stage
Stage-3 depends on stages: Stage-4
Stage-0 depends on stages: Stage-3
STAGE PLANS:
Stage: Stage-4
Map Reduce Local Work
Alias -> Map Local Tables:
ot
Fetch Operator
limit: -1
Alias -> Map Local Operator Tree:
ot
TableScan
alias: ot
Statistics: Num rows: 2 Data size: 88 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: uri is not null (type: boolean)
Statistics: Num rows: 1 Data size: 44 Basic stats: COMPLETE Column stats: NONE
HashTable Sink Operator
keys:
0 _col12 (type: string)
1 uri (type: string)
Stage: Stage-3
Map Reduce
Map Operator Tree:
TableScan
alias: dwd_web
Statistics: Num rows: 121543412 Data size: 282953087847 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: oid (type: string), server_time (type: bigint), cookie (type: int), product_name (type: string), client_time (type: bigint), pageid (type: string), label (type: string), url (type: string), referrer (type: string), url_path (type: string), parse_url(url_path, 'PATH') (type: string), parse_url(url_path, 'HOST') (type: string), referrer_path (type: string), js_version (type: string), user_name (type: string), page_title (type: string), tel (type: string), app_name (type: string), app_version (type: string), os_type (type: string), os_version (type: string), brand (type: string), model (type: string), clientid (type: bigint), cip (type: string), ua (type: string), country (type: string), province (type: string), city (type: string)
outputColumnNames: _col0, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29
Statistics: Num rows: 121543412 Data size: 282953087847 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: _col12 is not null (type: boolean)
Statistics: Num rows: 60771706 Data size: 141476543923 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col12 (type: string)
1 uri (type: string)
outputColumnNames: _col0, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29, _col31
Statistics: Num rows: 66848878 Data size: 155624201688 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: string), _col11 (type: string), _col2 (type: bigint), _col3 (type: int), _col4 (type: string), _col5 (type: bigint), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: string), _col18 (type: string), _col19 (type: string), _col20 (type: string), _col21 (type: string), _col22 (type: string), _col23 (type: string), _col31 (type: string), _col24 (type: bigint), _col25 (type: string), _col26 (type: string), _col27 (type: string), _col28 (type: string), _col29 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28
Statistics: Num rows: 66848878 Data size: 155624201688 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 66848878 Data size: 155624201688 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Local Work:
Map Reduce Local Work
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
本站文章为和通数据库网友分享或者投稿,欢迎任何形式的转载,但请务必注明出处.
同时文章内容如有侵犯了您的权益,请联系QQ:970679559,我们会在尽快处理。