欢迎投稿

今日深度:

Oracle 学习之 数据仓库(二) Dimension 的理解

Oracle 学习之 数据仓库(二) Dimension 的理解


在数据仓库中,有事实表、维度表两个概念。
 
事实表是数据仓库结构中的中央表,它包含联系事实与维度表的数字度量值和键。事实数据表包含描述业务(例如产品销售)内特定事件的数据。
 
维度表是维度属性的集合。是分析问题的一个窗口。是人们观察数据的特定角度,是考虑问题时的一类属性,属性的集合构成一个维。
 
如图示
 
 
我们以sh用户下的sales表和times表来看,
 
SALES为事实表
 
SQL> desc sales
 Name                     Null?    Type
 ----------------------------------------- -------- ----------------------------
 PROD_ID                  NOT NULL NUMBER
 CUST_ID                  NOT NULL NUMBER
 TIME_ID                  NOT NULL DATE
 CHANNEL_ID                   NOT NULL NUMBER
 PROMO_ID                 NOT NULL NUMBER
 QUANTITY_SOLD                NOT NULL NUMBER(10,2)
 AMOUNT_SOLD                  NOT NULL NUMBER(10,2)

 

TIMES为维度表
 
SQL> desc times
 Name                     Null?    Type
 ----------------------------------------- -------- ----------------------------
 TIME_ID                  NOT NULL DATE
 DAY_NAME                 NOT NULL VARCHAR2(9)
 DAY_NUMBER_IN_WEEK               NOT NULL NUMBER(1)
 DAY_NUMBER_IN_MONTH              NOT NULL NUMBER(2)
 CALENDAR_WEEK_NUMBER             NOT NULL NUMBER(2)
 FISCAL_WEEK_NUMBER               NOT NULL NUMBER(2)
 WEEK_ENDING_DAY              NOT NULL DATE
 WEEK_ENDING_DAY_ID               NOT NULL NUMBER
 CALENDAR_MONTH_NUMBER            NOT NULL NUMBER(2)
 FISCAL_MONTH_NUMBER              NOT NULL NUMBER(2)
 CALENDAR_MONTH_DESC              NOT NULL VARCHAR2(8)
 CALENDAR_MONTH_ID            NOT NULL NUMBER
 FISCAL_MONTH_DESC            NOT NULL VARCHAR2(8)
 FISCAL_MONTH_ID              NOT NULL NUMBER
 DAYS_IN_CAL_MONTH            NOT NULL NUMBER
 DAYS_IN_FIS_MONTH            NOT NULL NUMBER
 END_OF_CAL_MONTH             NOT NULL DATE
 END_OF_FIS_MONTH             NOT NULL DATE
 CALENDAR_MONTH_NAME              NOT NULL VARCHAR2(9)
 FISCAL_MONTH_NAME            NOT NULL VARCHAR2(9)
 CALENDAR_QUARTER_DESC            NOT NULL CHAR(7)
 CALENDAR_QUARTER_ID              NOT NULL NUMBER
 FISCAL_QUARTER_DESC              NOT NULL CHAR(7)
 FISCAL_QUARTER_ID            NOT NULL NUMBER
 DAYS_IN_CAL_QUARTER              NOT NULL NUMBER
 DAYS_IN_FIS_QUARTER              NOT NULL NUMBER
 END_OF_CAL_QUARTER               NOT NULL DATE
 END_OF_FIS_QUARTER               NOT NULL DATE
 CALENDAR_QUARTER_NUMBER          NOT NULL NUMBER(1)
 FISCAL_QUARTER_NUMBER            NOT NULL NUMBER(1)
 CALENDAR_YEAR                NOT NULL NUMBER(4)
 CALENDAR_YEAR_ID             NOT NULL NUMBER
 FISCAL_YEAR                  NOT NULL NUMBER(4)
 FISCAL_YEAR_ID              NOT NULL NUMBER
 DAYS_IN_CAL_YEAR             NOT NULL NUMBER
 DAYS_IN_FIS_YEAR             NOT NULL NUMBER
 END_OF_CAL_YEAR              NOT NULL DATE
 END_OF_FIS_YEAR              NOT NULL DATE

 

如果我们创建一个物化视图
 
create materialized view sales_month_sum 
enable query rewrite as 
  SELECT t.calendar_month_id,
         prod_id,
         channel_id,
         promo_id,
         SUM (quantity_sold) quantity_sold,
         SUM (amount_sold) amount_sold
    FROM sales s, times t
   WHERE s.time_id = t.time_id
GROUP BY prod_id,
         channel_id,
         promo_id,
         t.calendar_month_id;

 

 
如果我们做如下按月的分组查询
 
SQL> alter session set query_rewrite_enabled=true;
SQL> alter session set query_rewrite_integrity=trusted;
SQL> set autotrace traceonly
SQL> set line 200
SQL>   SELECT t.calendar_month_id,
         prod_id,
         channel_id,
         promo_id,
         SUM (quantity_sold) quantity_sold,
         SUM (amount_sold) amount_sold
    FROM sales s, times t
   WHERE s.time_id = t.time_id
GROUP BY prod_id,
         channel_id,
         promo_id,
         t.calendar_month_id; 

9068 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3287305789

------------------------------------------------------------------------------------------------
| Id  | Operation            | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |            |  9068 |   690K|    13  (0)| 00:00:01 |
|   1 |  MAT_VIEW REWRITE ACCESS FULL| SALES_MONTH_SUM |  9068 |   690K|    13  (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

 

 
可见查询使用的是物化视图,但是如果我需要按年、季度对数据做分组查询呢?
 
  SELECT t.calendar_quarter_id,prod_id,
         channel_id,
         promo_id,
         SUM (quantity_sold) quantity_sold,
         SUM (amount_sold) amount_sold
    FROM sales s, times t
   WHERE s.time_id = t.time_id
GROUP BY prod_id,
         channel_id,
         promo_id,
         t.calendar_quarter_id;

 

这个查看肯定是不能使用物化视图的,执行计划如下
 
Execution Plan
----------------------------------------------------------
Plan hash value: 3221963832

---------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name | Rows | Bytes | Cost (%CPU)| Time    | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |  |  2037 | 79443 |   569   (6)| 00:00:07 |  |   |
|   1 |  HASH GROUP BY            |  |  2037 | 79443 |   569   (6)| 00:00:07 |  |   |
|*  2 |   HASH JOIN            |  |   918K|    34M|   546   (2)| 00:00:07 |    |   |
|   3 |    PART JOIN FILTER CREATE    | :BF0000 |  1826 | 21912 |    18   (0)| 00:00:01 |    |   |
|   4 |     TABLE ACCESS FULL          | TIMES    |  1826 | 21912 |    18   (0)| 00:00:01 | |   |
|   5 |    PARTITION RANGE JOIN-FILTER|  |   918K|    23M|   525   (2)| 00:00:07 |:BF0000|:BF0000|
|   6 |     TABLE ACCESS FULL          | SALES    |   918K|    23M|   525   (2)| 00:00:07 |:BF0000|:BF0000|
---------------------------------------------------------------------------------------------------------

 

Oracle为了是查询重写更加的智能,引入了Dimension的概念。Dimension我们称之为维,它是基于维度表的,用来描述维度表的维度之间的层级关系。
CREATE DIMENSION SH.TIMES_DIM
  LEVEL DAY                            IS 
    (SH.TIMES.TIME_ID)
  LEVEL MONTH                          IS 
    (SH.TIMES.CALENDAR_MONTH_ID)
  LEVEL QUARTER                        IS 
    (SH.TIMES.CALENDAR_QUARTER_ID)
  LEVEL YEAR                           IS 
    (SH.TIMES.CALENDAR_YEAR_ID)
  HIERARCHY CAL_ROLLUP
    (DAY                               CHILD OF
     MONTH                             CHILD OF
     QUARTER                           CHILD OF
     YEAR);

 

 
LEVEL定义等级,基于维度表,HIERARCHY关键字定义层级关系。由层级关系,我们知道quarter是由month组成的。
 
我们再次查询
 
SQL> SELECT t.calendar_quarter_id,prod_id,
         channel_id,
         promo_id,
         SUM (quantity_sold) quantity_sold,
         SUM (amount_sold) amount_sold
    FROM sales s, times t
   WHERE s.time_id = t.time_id
GROUP BY prod_id,
         channel_id, 
         promo_id,
         t.calendar_quarter_id;

3375 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3397140165

--------------------------------------------------------------------------------------------------
| Id  | Operation              | Name        | Rows  | Bytes | Cost (%CPU)| Time    |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |    20 |  1720 |    36  (14)| 00:00:01 |
|   1 |  HASH GROUP BY             |      |    20 |  1720 |    36  (14)| 00:00:01 |
|*  2 |   HASH JOIN             |      |   128K|    10M|    33   (7)| 00:00:01 |
|   3 |    VIEW               |      |   849 |  6792 |    19   (6)| 00:00:01 |
|   4 |     HASH UNIQUE         |      |   849 |  6792 |    19   (6)| 00:00:01 |
|   5 |      TABLE ACCESS FULL         | TIMES       |  1826 | 14608 |    18   (0)| 00:00:01 |
|   6 |    MAT_VIEW REWRITE ACCESS FULL| SALES_MONTH_SUM |  9068 |   690K|    13   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

 

 
这次是使用物化视图与times表做关联,性能更高了。
 
我们对比如下两个查询
SQL>   SELECT t.calendar_quarter_id,
         prod_id,
         channel_id,
         promo_id,
         SUM (quantity_sold) quantity_sold,
         SUM (amount_sold) amount_sold
    FROM sales s, times t
   WHERE s.time_id = t.time_id AND t.calendar_quarter_id = 1769
GROUP BY prod_id,
         channel_id,
         promo_id,
         t.calendar_quarter_id; 

168 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3397140165

--------------------------------------------------------------------------------------------------
| Id  | Operation              | Name        | Rows  | Bytes | Cost (%CPU)| Time    |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |     1 |    86 |    33   (7)| 00:00:01 |
|   1 |  HASH GROUP BY             |      |     1 |    86 |    33   (7)| 00:00:01 |
|*  2 |   HASH JOIN             |      |  6423 |   539K|    32   (4)| 00:00:01 |
|   3 |    VIEW               |      |    34 |   272 |    19   (6)| 00:00:01 |
|   4 |     HASH UNIQUE         |      |    34 |   272 |    19   (6)| 00:00:01 |
|*  5 |      TABLE ACCESS FULL          | TIMES       |    90 |   720 |    18   (0)| 00:00:01 |
|   6 |    MAT_VIEW REWRITE ACCESS FULL| SALES_MONTH_SUM |  9068 |   690K|    13   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

 

 
使用了物化视图
SQL>SELECT t.calendar_quarter_id,
         prod_id,
         channel_id,
         promo_id,
         SUM (quantity_sold) quantity_sold,
         SUM (amount_sold) amount_sold
    FROM sales s, times t
   WHERE s.time_id = t.time_id AND t.calendar_quarter_desc = '1998-01'
GROUP BY prod_id,
         channel_id,
         promo_id,
         t.calendar_quarter_id;

168 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3221963832

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time    | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |  |  8146 |   373K| |   848   (2)| 00:00:11 |   |   |
|   1 |  HASH GROUP BY            |  |  8146 |   373K|  3632K|   848   (2)| 00:00:11 |   |   |
|*  2 |   HASH JOIN            |  | 57459 |  2637K|   |   546   (2)| 00:00:07 |   |   |
|   3 |    PART JOIN FILTER CREATE    | :BF0000 |    91 |  1820 |  |    18   (0)| 00:00:01 |  |   |
|*  4 |     TABLE ACCESS FULL       | TIMES    |    91 |  1820 |   |    18   (0)| 00:00:01 |  |   |
|   5 |    PARTITION RANGE JOIN-FILTER|  |   918K|    23M|    |   525   (2)| 00:00:07 |:BF0000|:BF0000|
|   6 |     TABLE ACCESS FULL          | SALES    |   918K|    23M|    |   525   (2)| 00:00:07 |:BF0000|:BF0000|
-----------------------------------------------------------------------------------------------------------------

 

没有使用物化视图。
 
其实条件实质上是一样的,因为t.calendar_quarter_desc = '1998-01' 和t.calendar_quarter_id = 1769 在times表中表示相同的数据。
 
但是Oracle不知道CALENDAR_QUARTER_DESC与CALENDAR_QUARTER_ID的关系。
 
我们在创建Dimension时,可以为LEVEL指定属性值。
 
如下
 
CREATE DIMENSION SH.TIMES_DIM
  LEVEL DAY                            IS 
    (SH.TIMES.TIME_ID)
  LEVEL MONTH                          IS 
    (SH.TIMES.CALENDAR_MONTH_ID)
  LEVEL QUARTER                        IS 
    (SH.TIMES.CALENDAR_QUARTER_ID)
  LEVEL YEAR                           IS 
    (SH.TIMES.CALENDAR_YEAR_ID)
  HIERARCHY CAL_ROLLUP
    (DAY                               CHILD OF
     MONTH                             CHILD OF
     QUARTER                           CHILD OF
     YEAR)
  ATTRIBUTE QUARTER DETERMINES 
    (SH.TIMES.CALENDAR_QUARTER_DESC,
     SH.TIMES.DAYS_IN_CAL_QUARTER,
     SH.TIMES.END_OF_CAL_QUARTER,
     SH.TIMES.CALENDAR_QUARTER_NUMBER)
  ATTRIBUTE YEAR DETERMINES 
    (SH.TIMES.CALENDAR_YEAR,
     SH.TIMES.DAYS_IN_CAL_YEAR,
     SH.TIMES.END_OF_CAL_YEAR);

 

我们再次查询
 
SQL> SELECT t.calendar_quarter_id,
         prod_id,
         channel_id,
         promo_id,
         SUM (quantity_sold) quantity_sold,
         SUM (amount_sold) amount_sold
    FROM sales s, times t
   WHERE s.time_id = t.time_id AND t.calendar_quarter_desc = '1998-01'
GROUP BY prod_id,
         channel_id,
         promo_id,
         t.calendar_quarter_id;  

168 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3290467316

--------------------------------------------------------------------------------------------------
| Id  | Operation              | Name        | Rows  | Bytes | Cost (%CPU)| Time    |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |    20 |  2240 |    33   (7)| 00:00:01 |
|   1 |  HASH GROUP BY             |      |    20 |  2240 |    33   (7)| 00:00:01 |
|*  2 |   HASH JOIN             |      | 17191 |  1880K|    32   (4)| 00:00:01 |
|   3 |    VIEW               | VW_GBF_5    |    91 |  3094 |    19   (6)| 00:00:01 |
|   4 |     HASH GROUP BY           |      |    91 |   728 |    19   (6)| 00:00:01 |
|   5 |      VIEW            |      |    91 |   728 |    19   (6)| 00:00:01 |
|   6 |       HASH UNIQUE          |      |    91 |  1456 |    19   (6)| 00:00:01 |
|*  7 |        TABLE ACCESS FULL       | TIMES      |    91 |  1456 |    18   (0)| 00:00:01 |
|   8 |    MAT_VIEW REWRITE ACCESS FULL| SALES_MONTH_SUM |  9068 |   690K|    13   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

 

这次就使用了物化视图。

www.htsjk.Com true http://www.htsjk.com/oracle/23746.html NewsArticle Oracle 学习之 数据仓库(二) Dimension 的理解 在数据仓库中,有事实表、维度表两个概念。 事实表是数据仓库结构中的中央表,它包含联系事实与维度表的数字度量值和键。事实数据表...
评论暂时关闭