Hive join,
a
1 a
2 b
3 c
4 d
7 y
8 u
b
2 bb
3 cc
7 yy
9 pp
左链接:返回左表全部,如果右表有相对应的则返回,否则返回null
select * from a left join b on a.id=b.id;
1 a NULL NULL
2 b 2 bb
3 c 3 cc
4 d NULL NULL
7 y 7 yy
8 u NULL NULL
右链接 : 返回右表全部,如果左表有相应的值则返回,否则返回null
select * from a right join b on a.id=b.id;
内链接:返回两表中都有的值
select * from a inner join b on a.id=b.id;
Total MapReduce CPU Time Spent: 3 seconds 580 msec
OK
2 b 2 bb
3 c 3 cc
7 y 7 yy
select * from a full outer join b on a.id=b.id;
会把所有的字段列出来有对应的值则显示,没有的则为null
1 a NULL NULL
2 b 2 bb
3 c 3 cc
4 d NULL NULL
7 y 7 yy
8 u NULL NULL
NULL NULL 9 pp
hive不支持exist in 所以采用left semi join
select * from a left semi join b on a.id=b.id;
2 b
3 c
7 y
select a.key a.value from a where a.key exist in (select b.key from b);
select a.key,a.value from a left semi join b on (a.key=b.key);
可以用inner join来做,但返回值会很多效率慢,
select * from a left semi join b on a.id=b.id;
left semi join 可以这样说返回inner join(交集)中的只属于a的东西