Monday, April 29, 2013

Enhanced Aggregation, Cube, Grouping and Rollup.


(OLAP reporting embedded in SQL)


Much of the OLAP reporting feature embedded in Oracle SQL is ignored. People turn to expensive OLAP reporting tools in the market - even for simple reporting needs. This article outlines some of the common OLAP reporting needs and shows how to meet them by using the enhanced aggregation features of Oracle SQL.

The article is divided in two sections. The first introduces the GROUP BY extensions of SQL, and the second uses them to generate some typical reports. A section at the end introduces the common OLAP terminologies.
The enhanced SQL aggregation features are available across all flavors of Oracle including Oracle Standard Edition One. It might be worth mentioning here, that Oracle OLAP, the special OLAP package of Oracle, is not available with Oracle Standard Edition and Standard Edition One. Enhanced aggregation features discussed here have been tested on Oracle 9i and Oracle 10g.

Advanced Aggregation Extensions of GROUP BY

GROUPING SETS clause, GROUPING function and GROUPING_ID function
The fundamental concept of enhanced aggregation features of Oracle is that of GROUPING SETS. All other aggregation features can be expressed in terms of it. With GROUPING SETS clause comes the functions GROUPING, GROUPING_ID and GROUP_ID.
The GROUPING SETS clause in GROUP BY allows us to specify more than one GROUP BY options in the same record set. All GROUPING clause query can be logically expressed in terms of several GROUP BY queries connected by UNION. Table-1 shows several such equivalent statements. This is helpful in forming the idea of the GROUPING SETS clause. A blank set ( ) in the GROUPING SETS clause calculates the overall aggregate.
Table 1 - GROUPING SET queries and the equivalent GROUP BY queries
Set A - Aggregate Query with GROUPING SETS
Set B - Equivalent Aggregate Query with GROUP BY

A1. SELECT a, b, SUM(c) FROM tab1 GROUP BY GROUPING SETS ( (a,b) )

B1. SELECT a, b, SUM(c) FROM tab1 GROUP BY a, b

A2. SELECT a, b, SUM( c ) FROM tab1 GROUP BY GROUPING SETS ( (a,b), a)

B2. SELECT a, b, SUM( c ) FROM tab1 GROUP BY a, b UNION
    SELECT a, null, SUM( c ) FROM tab1 GROUP BY a

A3. SELECT a,b, SUM( c ) FROM tab1 GROUP BY GROUPING SETS (a,b)

B3. SELECT a, null, SUM( c ) FROM tab1 GROUP BY a
    UNION
    SELECT null, b, SUM( c ) FROM tab1 GROUP BY b

A4. SELECT a, b, SUM( c ) FROM tab1 GROUP BY GROUPING SETS ( (a, b), a, b, ( ) )

B4. SELECT a, b, SUM( c ) FROM tab1 GROUP BY a, b UNION
    SELECT a, null, SUM( c ) FROM tab1 GROUP BY a, null UNION 
    SELECT null, b, SUM( c ) FROM tab1 GROUP BY null, b UNION
    SELECT null, null, SUM( c ) FROM tab1

Example (Table-1 Set 4) is like a superset of all the above cases and also includes an overall aggregate by the use of ( ). We will see latter that this result is similar to that of CUBE (a, b). The first 3 columns of Table-2 show the result of a query of this type.
GROUPING clause uses a single scan to compute all the required aggregates. So the performance is better than its logical equivalent of several GROUP BY and UNION.
The general syntax of a SQL with GROUPING SETS is -
SELECT <grouping_columns>, <aggregate_functions>
FROM <table_list>
WHERE <where_condition>
GROUP BY GROUPING SETS (<column_set_1>, ... , <column_set_N>

The "column sets" can have none, one or more "grouping column" from SELECT. However, all columns from the select should be present in at least one of the column sets. In mathematical terms -
UNION UNION should be
equal to

So the following two queries below will return error -
(1) SELECT a, b, c, SUM(d ) FROM tab1 
    GROUP BY GROUPING SETS ( (a,b), b)
--- Reason (a,b) U ( b ) is not equal to (a,b,c)
(2) SELECT a, b, SUM( c ) FROM tab1 
    GROUP BY GROUPING SETS (a, ( ) )
--- Reason (a) U ( ) is not equal to ( a, b )
Table 2 - A GROUPING SET query with GROUPING and GROUPING_ID Function on EMP
SELECT deptno, job, SUM(sal),
GROUPING(deptno) GDNO, GROUPING (job) GJNO,
GROUPING_ID(deptno, job) GID_DJ, GROUPING_ID(job, deptno) GID_JD
FROM EMP
GROUP BY GROUPING SETS ( (deptno, job), deptno, job, ( ))

    DEPTNO JOB         SUM(SAL)       GDNO       GJNO     GID_DJ     GID_JD
---------- --------- ---------- ---------- ---------- ---------- ----------

        10 CLERK           1300          0          0          0          0
        10 MANAGER         2450          0          0          0          0
        10 PRESIDENT       5000          0          0          0          0
        20 CLERK           1900          0          0          0          0
        20 ANALYST         6000          0          0          0          0
        20 MANAGER         2975          0          0          0          0
        30 CLERK            950          0          0          0          0
        30 MANAGER         2850          0          0          0          0
        30 SALESMAN        5600          0          0          0          0
        10                 8750          0          1          1          2
        20                10875          0          1          1          2
        30                 9400          0          1          1          2
           ANALYST         6000          1          0          2          1
           CLERK           4150          1          0          2          1
           MANAGER         8275          1          0          2          1
           PRESIDENT       5000          1          0          2          1
           SALESMAN        5600          1          0          2          1
                          29025          1          1          3          3

18 rows selected.
GROUPING Function and GROUPING_ID Function
From Table-2 we see that when aggregates are displayed for a column its value is null. This may conflict in case the column itself has some null values. There needs to be some way to identify NULL in column, which means aggregate and NULL in column, which means value. GROUPING function is the solution to that.
This function returns a flag "1" for a row in the result set if that column has been aggregated in that row. Otherwise the value is "0". There can be only one column expression as the argument of the GROUPING function and that column should also be in the SELECT. GROUPING function can be used to substitute the NULL value, which usually appears in columns at the aggregation level by something meaningful like Total.
GROUPING function has the general syntax of GROUPING ( ). It is used only in SELECT clause. It takes only a single column expression as argument.
GROUPING_ID takes a set of columns. It applies the GROUPING function on each column in its argument and composes a bit vector with the "0" and "1" values. It returns the decimal equivalent of the bit vector. The columns GID_DJ and GID_JD show the use of GROUPING_ID function and also show how interchanging the order of the columns inside the GROUPING_ID function might impact the result.
CUBE
This is the most generalized aggregation clause. The general syntax is CUBE ( ). It is used with the GROUP BY only. CUBE creates a subtotal of all possible combinations of the set of column in its argument. Once we compute a CUBE on a set of dimension, we can get answer to all possible aggregation questions on those dimensions. Table-3 shows a cube building.
It might be also worth mentioning here that
GROUP BY CUBE( a, b, c) is equivalent to
GROUP BY GROUPING SETS ( (a, b, c), (a, b), (b, c), (a, c), (a), (b), (c), ( )).
ROLLUP
ROLLUP clause is used with GROUP BY to compute the aggregate at the hierarchy levels of a dimension.
ROLLUP(a, b, c) assumes that the hierarchy is "a" drilling down to "b" drilling down to "c".
ROLLUP (a, b, c) is equivalent to GROUPING SETS ( (a, b, c), (a, b), (a), ( )).
The general syntax of ROLLUP is ROLLUP( )
Composite Columns
A composite column is a collection of columns that can be used in CUBE or ROLLUP. They are treated as unit before computing the aggregate.Composite columns usage in CUBE and ROLLUP and the equivalent GROUPING SETS -
. CUBE( (a, b), c) is equivalent to GROUPING SETS ( (a, b, c), (a, b) , c, ( ))
. ROLLUP ( a, (b, c) ) is equivalent to GROUPING SETS ( (a, b, c), ( a ), ( ) )
Partial GROUPING SETS, CUBE or ROLLUP
If any column appears in GROUP BY but outside the aggregation clauses discussed above. It can be thought of as being first column of the resulting GROUPING SET equivalent. The following examples make this clear.
 GROUP BY a, CUBE( b, c) is equivalent to 
  GROUP BY GROUPING SETS ( (a, b, c), (a, b), (a, c), (a) )
 GROUP BY a, ROLLUP( b, c) is equivalent to 
  GROUP BY GROUPING SETS ( (a, b, c), (a, b), (a) )
OLAP Reporting using enhanced aggregation features
While the queries on the EMP table are used to illustrate the GROUPING SETS they will be poor examples for discussing the next sections. The reason is that the tables are not in a Star-Schema format. Please run the script (Script A) to get a simple Star Schema.
The tables of the Sample Schema are
Product(prdid, prd_name, prd_family)
TimeByDay(datekey, td_month, td_quarter, td_year)
Location( Loc_id, City, State, Country)
Customer(cust_id, cust_name, cust_type);
Sales(sales_id, cust_id, loc_id, prdid, sales_date, amount);
The schema is about a fictitious Art Trader that supplies remakes of statues of famous historical figures (like ALEXANDER, BUDDHA, etc) or landscape paintings of places (like SIKKIM, etc). They sell to museums, resellers or individuals.
The dimensions are Product, TimeByDay, Location and Customer. The fact is Sales.
The hierarchies are -
(1) Product_Name (prd_name) -> Product Family (prd_family)
(2) Date (datekey) -> Month (td_month) -> Quarter (td_quarter) -> Year (td_year)
(3) City -> State -> Country
(4) Customer_Name (cust_name) -> Customer_Type (cust_type)
The two approaches used for generating OLAP reports are as follows -
(1) Get the most generalized possible CUBE built with the dimensions, or
(2) Use on the fly aggregation queries to get the real-time report.
Using generalized pre-built CUBE for CUBE, ROLLUP, Drill Down and Slicing Queries.
This approach consists of building a table or a materialized view with the CUBE of the dimensions. Table-3 shows the SQL to build such a cube. The generalized CUBE keeps all possible meaningful aggregation pre-computed. We need to query some of the rows of the CUBE to get the desired values. Since the CUBE stores all possible permutations of the dimensions there is a chance that the number or records in the cube itself might be large. Intelligent use of composite columns might help a great deal here. Note the use of composite columns (City, State) in the CUBE. This is because each state has got only one city with the office of our demo organization.
The GROUPING_ID function helps to achieve the ROLLUP. For example, take the combination (cust_name, cust_type). It is meaningless to make the cube perform aggregations for customer types across customer names. So we include only the bit vectors (1,1), (1, 0) and (0,0) that is GROUPING_ID of 3, 2 and 0 on the customer dimension.
Table 3 - Building a cube
CREATE TABLE sales_cube AS
SELECT prd_name, prd_family,
       datekey, td_month, td_quarter, td_year,
       cust_name, cust_type,
       city, state, country,
       GROUPING_ID (prd_name, prd_family) GID_product,
       GROUPING_ID (datekey, td_month, td_quarter, td_year) GID_DATE,
       GROUPING_ID (cust_name, cust_type) GID_CUST,
       GROUPING_ID (city, state, country) GID_LOC,
       sum(amount) amount
FROM sales, product, timebyday, location, customer
WHERE sales.cust_id = customer.cust_id and
      sales.loc_id = location.loc_id and
      sales.sales_date = timebyday.datekey and
      sales.prdid = product.prdid
GROUP BY
CUBE ( prd_name, prd_family,
       datekey, td_month, td_quarter, td_year,
       cust_name, cust_type,
       (city, state), country )
HAVING
( GROUPING_ID (prd_name, prd_family) = 0 or
  GROUPING_ID (prd_name, prd_family) = 2 or
  GROUPING_ID (prd_name, prd_family) = 3) and
( GROUPING_ID (datekey, td_month, td_quarter, td_year) = 0 or
  GROUPING_ID (datekey, td_month, td_quarter, td_year) = 8 or
  GROUPING_ID (datekey, td_month, td_quarter, td_year) = 12 or
  GROUPING_ID (datekey, td_month, td_quarter, td_year) = 14 or
  GROUPING_ID (datekey, td_month, td_quarter, td_year) = 15) and
( GROUPING_ID (cust_name, cust_type) = 0 or
  GROUPING_ID (cust_name, cust_type) = 2 or
  GROUPING_ID (cust_name, cust_type) = 3) and
( GROUPING_ID (city, state, country) = 0 or
  GROUPING_ID (city, state, country) = 6 or
  GROUPING_ID (city, state, country) = 7 );
The next table (Table-4) shows a typical crosstab query of sales for Product and Location. It shows the query and also how to generate a crosstab report out of it by using the function CROSSTAB (Script-B). The next examples show the query and cross-tab report and skips the PLSQL portion. The WHERE condition is determined by the bit vectors. We need -
  • Details of product and details of customer - Both Product and Customer dimensions are all details. So GID_Product = bit vector (0,0) = 0. Same for GID_Cust.
  • Summary of product and details of customer - Product is summarized fully, so GID_Product = bit vector (1,1) = 3.
  • Details of product and summary of customer - Customer is summarized fully, GID_Customer = bit vector (1,1) = 3.
  • Summary of product and summary of customer - Both Customer and products are summarized.
  • Along with any of the above 4 conditions we need full summary or the rest of the dimensions. So GID_date = bit vector (1,1, 1,1) = 15 and GID_Loc = bit vector (1,1,1) = 7.
Table-4 Crosstab Query on Product and Location (Query, Generation Routine and Result)
/*********** The Query ***********/
SELECT prd_name, cust_name, amount
FROM sales_cube
WHERE ((GID_Product = 0 and GID_Cust = 0) or 
       (GID_Product = 0 and GID_Cust = 3) or
       (GID_Product = 3 and GID_Cust = 0) or
       (GID_Product = 3 and GID_Cust = 3)) and 
      GID_date = 15 and 
      GID_LOC = 7;

/*********** Generating the crosstab report ********/
set serveroutput on
set lines 120
var tempstr varchar2(500)

exec :tempstr := ''||-
'SELECT cust_name, prd_name, amount'||chr(10)||-
'FROM sales_cube'||chr(10)||-
'WHERE ((GID_Product = 0 and GID_Cust = 0) or'||chr(10)||- 
'       (GID_Product = 0 and GID_Cust = 3) or'||chr(10)||-
'       (GID_Product = 3 and GID_Cust = 0) or'||chr(10)||-
'       (GID_Product = 3 and GID_Cust = 3)) and'||chr(10)||- 
'      GID_date = 15 and'||chr(10)||-
'      GID_LOC = 7';

exec crosstab(:tempstr);

*Customers *    *---------------------- Products -----------------------------*
************    ALEXANDER       BUDDHA CHANDRAGUPTA   PURI BEACH       SIKKIM    --Total--
ART HOUSE               0            0            0          500          750         1250
BARKER               5100            0            0            0            0         5100
JONES                   0            0            0         2050         3500         5550
MAHAJATI                0            0            0         1000            0         1000
RATAN                   0         5000            0            0         4000         9000
SMITH                9500         9000          900            0            0        19400
STONEWORK             850          800         6000            0            0         7650
--Total--           15450        14800         6900         3550         8250        48950
Tables Table-5Table-6 and Table-7 show TimeSales report and drill-down to the quarters of year 2003. Slicing is achieved by including WHERE condition in the query with desired values of the dimensions. Drill down is achieved by selection of proper value of GID_ type columns and by deciding the proper GROUPING value of all the dimensions at the particular level of drill down. Dicing is achieved by merely interchanging the first two columns of SELECT.
Table-5 Year-Product Sales Report: Main (Query and Result)
SELECT prd_name, td_year, amount
FROM Sales_cube
WHERE ((GID_Product = 0 and GID_date = 14) or
       (GID_Product = 0 and GID_date = 15) or
       (GID_Product = 3 and GID_date = 14) or
       (GID_Product = 3 and GID_date = 15)) and
      GID_Cust = 3 and
      GID_Loc = 7;

************         2002         2003    --Total--
ALEXANDER            5100        10350        15450
BUDDHA               6800         8000        14800
CHANDRAGUPTA            0         6900         6900
PURI BEACH           3550            0         3550
SIKKIM                  0         8250         8250
--Total--           15450        33500        48950
Table-6 Year-Product Sales Report:Drill Down to Quarters and Dicing Product and Time Dimensions (Query and Result)
SELECT td_year||td_quarter, prd_name, amount
FROM Sales_cube
WHERE ((GID_Product = 0 and GID_date = 12) or
       (GID_Product = 0 and GID_date = 15) or
       (GID_Product = 3 and GID_date = 12) or
       (GID_Product = 3 and GID_date = 15)) and
      GID_Cust = 3 and
      GID_Loc = 7;

************    ALEXANDER       BUDDHA CHANDRAGUPTA   PURI BEACH       SIKKIM    --Total--
2002Q1                  0         1000            0         2050            0         3050
2002Q2               5100            0            0          500            0         5600
2002Q4                  0         5800            0         1000            0         6800
2003Q1              10350            0         6000            0            0        16350
2003Q2                  0            0            0            0          750          750
2003Q3                  0         8000            0            0         3500        11500
2003Q4                  0            0          900            0         4000         4900
--Total--           15450        14800         6900         3550         8250        48950
Table-6 Year-Product Sales Report:Slice of year 2003, Quarter level drill down (Query and Result)
SELECT td_year||td_quarter, prd_name, amount
FROM Sales_cube
WHERE ((GID_Product = 0 and GID_date = 12) or
       (GID_Product = 0 and GID_date = 14) or
       (GID_Product = 3 and GID_date = 12) or
       (GID_Product = 3 and GID_date = 14)) and
      GID_Cust = 3 and
      GID_Loc = 7 and
      td_year = 2003;

************    ALEXANDER       BUDDHA CHANDRAGUPTA       SIKKIM    --Total--
2003                10350         8000         6900         8250        33500
2003Q1              10350            0         6000            0        16350
2003Q2                  0            0            0          750          750
2003Q3                  0         8000            0         3500        11500
2003Q4                  0            0          900         4000         4900
Using on-the-fly aggregation queries for CUBE, ROLLUP, Drill Down and Slicing
While using on-the-fly aggregation queries the cube is not pre-computed and we get the real time summary. However the performance is slower than querying pre-computed cubes. Several features (CUBE, ROLLUP, Composite Columns) discussed here can be used to generate the required aggregation levels. An important thing to ensure during executing on-the-fly queries is that, the query should not perform any useless aggregation. Proper use of the GROUPING functions is important.
Conclusion
Most of the OLAP tools will provide several additional features other than just reporting. There are user-friendly drag and drop interfaces which make drill-down, rollup, slicing, dicing happen on a mouse-click. Report generation and formatting is easier for someone who is not familiar with SQL. There are security features that restrict specific users from drilling down specific sections of the cube or viewing some specific cubes.
When the requirements are just few canned OLAP reports or when simple custom GUI can be made to mask the SQLs, use of the enhanced aggregation features can be really effective. A large portion of the requirements do fall in the second category.
Glossary and Scripts
Script A Script to create Sample Tables for Reporting examples
drop table product;
drop table timebyday;
drop table location;
drop table customer;
drop table sales;

create table product
(prdid number(2), prd_name varchar2(12), prd_family varchar2(10));
create table timebyday
(datekey date, td_month char(3), td_quarter char(2), td_year char(4));
create table location
(loc_id number(2), city varchar2(10), state varchar2(10), country varchar2(10));
create table customer
(cust_id number(2), cust_name varchar2(10), cust_type varchar2(4));
create table sales
(sales_id number(6), cust_id number(2), loc_id number(2), prdid number(2), sales_date date, amount number);

insert into product values (1,'BUDDHA','STATUE');
insert into product values (2,'ALEXANDER','STATUE');
insert into product values (3,'CHANDRAGUPTA','STATUE');
insert into product values (4,'SIKKIM','PAINTING');
insert into product values (5,'PURI BEACH','PAINTING');

insert into location values (1, 'KOLKATA', 'WB','INDIA');
insert into location values (2, 'MUMBAI','MH','INDIA');
insert into location values (3, 'SYDNEY','SA','AUSTRALIA');
insert into location values (4, 'CHICAGO','IL','USA');

insert into customer values (1, 'SMITH', 'INDV');
insert into customer values (2, 'JONES', 'INDV');
insert into customer values (3, 'BARKER','INDV');
insert into customer values (4, 'ART HOUSE','ORGN');
insert into customer values (5, 'STONEWORK','ORGN');
insert into customer values (6, 'MAHAJATI','ORGN');
insert into customer values (7, 'RATAN','INDV');

insert into sales values (1, 1, 4, 1, '05-JAN-02', 1000);
insert into sales values (2, 2, 3, 5, '1-MAR-02', 2050);
insert into sales values (3, 3, 4, 2, '14-MAY-02', 5100);
insert into sales values (4, 4, 3, 5, '22-JUN-02',  500);
insert into sales values (5, 5, 3, 1, '15-OCT-02',  800);
insert into sales values (6, 6, 1, 5, '10-NOV-02', 1000);
insert into sales values (7, 7, 1, 1, '25-DEC-02', 5000);
insert into sales values (8, 1, 4, 2, '18-FEB-03', 9500);
insert into sales values (9, 5, 3, 2, '28-FEB-03',  850);
insert into sales values (10, 5, 4, 3, '15-MAR-03',6000);
insert into sales values (11, 4, 4, 4, '12-JUN-03', 750);
insert into sales values (12, 1, 4, 1, '18-JUL-03',8000);
insert into sales values (13, 2, 3, 4, '5-AUG-03',3500);
insert into sales values (14, 1, 4, 3, '8-NOV-03', 900);
insert into sales values (15, 7, 2, 4, '23-NOV-03',4000);

insert into timebyday
select distinct sales_date, to_char(sales_date,'MON'), 
'Q'||to_char(sales_date,'Q'), to_char(sales_date,'YYYY') from sales;

commit;
Script B Script to create procedure CROSSTAB
drop table temp_tab1;
create table temp_tab1
(col1 varchar2(12), col2 varchar2(12), col3 number);

create or replace procedure crosstab(ip_qstr varchar2)
is
type c1_rec is record 
(col1 varchar2(14), col2 varchar2(14), col3 number);
type c1_tab is table of c1_rec;
c1_tab1 c1_tab;
varstr1 varchar2(200) := '';
ho_value number;
begin
dbms_output.enable(1000000);
execute immediate('truncate table temp_tab1');

execute immediate ip_qstr bulk collect into c1_tab1;

for ii in 1..c1_tab1.LAST
loop
insert into temp_tab1 values
(nvl(c1_tab1(ii).col1,'--Total--'),nvl(c1_tab1(ii).col2,'--Total--'),c1_tab1(ii).col3);
end loop;

varstr1 := '************ ';
for jj in (select distinct nullif(col2,'--Total--') col2
           from temp_tab1
           order by 1 nulls last)
loop
varstr1 := varstr1||lpad(nvl(jj.col2,'--Total--'), 12)||' ';
end loop;

dbms_output.put_line(varstr1);

for ii in (select distinct nullif(col1,'--Total--') col1
           from temp_tab1 
           order by 1 nulls last)
loop
varstr1 := rpad(nvl(ii.col1,'--Total--'),12);
 for jj in (select distinct nullif(col2,'--Total--') col2 
                   from temp_tab1 
            order by 1 nulls last)
 loop
 begin
 select col3 into ho_value from temp_tab1
 where col1 = nvl(ii.col1,'--Total--')
        and col2 = nvl(jj.col2,'--Total--');
 exception
 when no_data_found then
 ho_value := 0;
 end;
 varstr1 := varstr1||' '||lpad(ho_value,12);
 end loop;

dbms_output.put_line(varstr1);
end loop;
return;
end crosstab;
/
OLAP Terminologies
I am not considering in this article familiarity with OLAP terms as a prerequisite. Readers are advised to skip this section in case they are familiar with OLAP terminologies.
  • Data Warehouse Huge Store of historical transaction data from various sources. Gradually keeps on growing as more and more data is loaded into it (history increases). Logical Table structure of a data warehouse is designed keeping in mind the parameters we would like to analyze them with. As such, the logical design has special forms like Star Schema or Snowflake Schema. The physical design is done kept in mind the ease of query. As such warehouse table structures are de-normalized and heavily indexed.
  • ETL Stands Extract, Transform and Load. This is the technique to load the huge store of point (1) from the various sources.
  • OLAP Engine OLAP stands for Online Analytical Processing, the technique of reporting from the data warehouse.
  • Dimension and Measures Take the function F (a, b, c) = (x, y, z) where a, b, c, x, y, and z are several attributes. Function F can be thought of as a business behavior where (x, y, z) are measures of that behavior. The business needs to know how the measure of behavior F, i.e. x, y, z, varies with categories (a, b, c). Probably, with that knowledge the business will try to manipulate (a,b,c) to get the desired (x, y, z), or it might take several other decisions not expressible in such simple terms. (a, b, c) are relatively static well defined categories known as Dimensions. The measures (x, y, z) are also known as Facts.
  • Dimension Table and Hierarchies Each dimension has a fixed (or slowly changing) domain of values, which is stored in the dimension table. Each dimension can have aggregate attributes defined at several levels known as hierarchy. Hierarchies of a given dimension are also stored in dimension tables. Let "a" be a dimension, let "a1" be a collection of "a", let "a2" be a collection of "a1". The dimension table can be (a, a1, a2).
  • Fact Table The table where dimensions are linked to the measures is known as Fact table. In our example we can take the fact table as (a, b, c, x, y, z). The dimension attributes mentioned in the fact tables are the ones with lowest granularity. So the fact table, in our example, will not have a1 or a2, which are aggregates.
  • Cube Cube stores pre-computed aggregations, such as sums and counts, across all dimensions. It is useful in answering multidimensional questions by avoiding frequent summary from the flood of detailed data.
  • Drill Down Frequently the business user will start with the highest aggregation of the hierarchy and ask for details on one or some selected dimensions. This is known as drill down.
  • Rollup The requirement might also be to start from the lower granularity of a particular dimension and then move to the higher aggregates. This activity is known as rollup. If the requirement is to view the aggregation across all the level of hierarchies of any given dimension, then such a report is also known as rollup report.
  • Slice Selecting some values of a dimension based on a filtering condition and then applying aggregation is known as slicing.
  • Dice Changing the outlook of the report table, i.e., interchanging the row labels and column labels is known as dicing. It might be mentioned here that dicing does not affect the values of the aggregate columns.

No comments:

Post a Comment