(OLAP reporting embedded in SQL)
Much of the OLAP reporting feature embedded in Oracle SQL is ignored. People turn to expensive OLAP reporting tools in the market - even for simple reporting needs. This article outlines some of the common OLAP reporting needs and shows how to meet them by using the enhanced aggregation features of Oracle SQL.
The article is divided in two sections. The first introduces the GROUP BY extensions of SQL, and the second uses them to generate some typical reports. A section at the end introduces the common OLAP terminologies.
The enhanced SQL aggregation features are available across all flavors of Oracle including Oracle Standard Edition One. It might be worth mentioning here, that Oracle OLAP, the special OLAP package of Oracle, is not available with Oracle Standard Edition and Standard Edition One. Enhanced aggregation features discussed here have been tested on Oracle 9i and Oracle 10g.
Advanced Aggregation Extensions of GROUP BY
GROUPING SETS clause, GROUPING function and GROUPING_ID function
The fundamental concept of enhanced aggregation features of Oracle is that of GROUPING SETS. All other aggregation features can be expressed in terms of it. With GROUPING SETS clause comes the functions GROUPING, GROUPING_ID and GROUP_ID.
The GROUPING SETS clause in GROUP BY allows us to specify more than one GROUP BY options in the same record set. All GROUPING clause query can be logically expressed in terms of several GROUP BY queries connected by UNION. Table-1 shows several such equivalent statements. This is helpful in forming the idea of the GROUPING SETS clause. A blank set ( ) in the GROUPING SETS clause calculates the overall aggregate.
Table 1 - GROUPING SET queries and the equivalent GROUP BY queries
Set A - Aggregate Query with GROUPING SETS Set B - Equivalent Aggregate Query with GROUP BY A1. SELECT a, b, SUM(c) FROM tab1 GROUP BY GROUPING SETS ( (a,b) ) B1. SELECT a, b, SUM(c) FROM tab1 GROUP BY a, b A2. SELECT a, b, SUM( c ) FROM tab1 GROUP BY GROUPING SETS ( (a,b), a) B2. SELECT a, b, SUM( c ) FROM tab1 GROUP BY a, b UNION SELECT a, null, SUM( c ) FROM tab1 GROUP BY a A3. SELECT a,b, SUM( c ) FROM tab1 GROUP BY GROUPING SETS (a,b) B3. SELECT a, null, SUM( c ) FROM tab1 GROUP BY a UNION SELECT null, b, SUM( c ) FROM tab1 GROUP BY b A4. SELECT a, b, SUM( c ) FROM tab1 GROUP BY GROUPING SETS ( (a, b), a, b, ( ) ) B4. SELECT a, b, SUM( c ) FROM tab1 GROUP BY a, b UNION SELECT a, null, SUM( c ) FROM tab1 GROUP BY a, null UNION SELECT null, b, SUM( c ) FROM tab1 GROUP BY null, b UNION SELECT null, null, SUM( c ) FROM tab1
Example (Table-1 Set 4) is like a superset of all the above cases and also includes an overall aggregate by the use of ( ). We will see latter that this result is similar to that of CUBE (a, b). The first 3 columns of Table-2 show the result of a query of this type.
GROUPING clause uses a single scan to compute all the required aggregates. So the performance is better than its logical equivalent of several GROUP BY and UNION.
The general syntax of a SQL with GROUPING SETS is -
SELECT <grouping_columns>, <aggregate_functions> FROM <table_list> WHERE <where_condition> GROUP BY GROUPING SETS (<column_set_1>, ... , <column_set_N>
The "column sets" can have none, one or more "grouping column" from SELECT. However, all columns from the select should be present in at least one of the column sets. In mathematical terms -
equal to
So the following two queries below will return error -
(1) SELECT a, b, c, SUM(d ) FROM tab1 GROUP BY GROUPING SETS ( (a,b), b) --- Reason (a,b) U ( b ) is not equal to (a,b,c) (2) SELECT a, b, SUM( c ) FROM tab1 GROUP BY GROUPING SETS (a, ( ) ) --- Reason (a) U ( ) is not equal to ( a, b )
Table 2 - A GROUPING SET query with GROUPING and GROUPING_ID Function on EMP
SELECT deptno, job, SUM(sal), GROUPING(deptno) GDNO, GROUPING (job) GJNO, GROUPING_ID(deptno, job) GID_DJ, GROUPING_ID(job, deptno) GID_JD FROM EMP GROUP BY GROUPING SETS ( (deptno, job), deptno, job, ( )) DEPTNO JOB SUM(SAL) GDNO GJNO GID_DJ GID_JD ---------- --------- ---------- ---------- ---------- ---------- ---------- 10 CLERK 1300 0 0 0 0 10 MANAGER 2450 0 0 0 0 10 PRESIDENT 5000 0 0 0 0 20 CLERK 1900 0 0 0 0 20 ANALYST 6000 0 0 0 0 20 MANAGER 2975 0 0 0 0 30 CLERK 950 0 0 0 0 30 MANAGER 2850 0 0 0 0 30 SALESMAN 5600 0 0 0 0 10 8750 0 1 1 2 20 10875 0 1 1 2 30 9400 0 1 1 2 ANALYST 6000 1 0 2 1 CLERK 4150 1 0 2 1 MANAGER 8275 1 0 2 1 PRESIDENT 5000 1 0 2 1 SALESMAN 5600 1 0 2 1 29025 1 1 3 3 18 rows selected.
GROUPING Function and GROUPING_ID Function
From Table-2 we see that when aggregates are displayed for a column its value is null. This may conflict in case the column itself has some null values. There needs to be some way to identify NULL in column, which means aggregate and NULL in column, which means value. GROUPING function is the solution to that.
This function returns a flag "1" for a row in the result set if that column has been aggregated in that row. Otherwise the value is "0". There can be only one column expression as the argument of the GROUPING function and that column should also be in the SELECT. GROUPING function can be used to substitute the NULL value, which usually appears in columns at the aggregation level by something meaningful like Total.
GROUPING function has the general syntax of GROUPING ( ) . It is used only in SELECT clause. It takes only a single column expression as argument.
GROUPING_ID takes a set of columns. It applies the GROUPING function on each column in its argument and composes a bit vector with the "0" and "1" values. It returns the decimal equivalent of the bit vector. The columns GID_DJ and GID_JD show the use of GROUPING_ID function and also show how interchanging the order of the columns inside the GROUPING_ID function might impact the result.
CUBE
This is the most generalized aggregation clause. The general syntax is CUBE ( ) . It is used with the GROUP BY only. CUBE creates a subtotal of all possible combinations of the set of column in its argument. Once we compute a CUBE on a set of dimension, we can get answer to all possible aggregation questions on those dimensions. Table-3 shows a cube building.
It might be also worth mentioning here that
GROUP BY CUBE( a, b, c) is equivalent to
GROUP BY GROUPING SETS ( (a, b, c), (a, b), (b, c), (a, c), (a), (b), (c), ( )).
GROUP BY CUBE( a, b, c) is equivalent to
GROUP BY GROUPING SETS ( (a, b, c), (a, b), (b, c), (a, c), (a), (b), (c), ( )).
ROLLUP
ROLLUP clause is used with GROUP BY to compute the aggregate at the hierarchy levels of a dimension.
ROLLUP(a, b, c) assumes that the hierarchy is "a" drilling down to "b" drilling down to "c".
ROLLUP(a, b, c) assumes that the hierarchy is "a" drilling down to "b" drilling down to "c".
ROLLUP (a, b, c) is equivalent to GROUPING SETS ( (a, b, c), (a, b), (a), ( )).
The general syntax of ROLLUP is ROLLUP()
The general syntax of ROLLUP is ROLLUP(
Composite Columns
A composite column is a collection of columns that can be used in CUBE or ROLLUP. They are treated as unit before computing the aggregate.Composite columns usage in CUBE and ROLLUP and the equivalent GROUPING SETS -
. CUBE( (a, b), c) is equivalent to GROUPING SETS ( (a, b, c), (a, b) , c, ( )) . ROLLUP ( a, (b, c) ) is equivalent to GROUPING SETS ( (a, b, c), ( a ), ( ) )
Partial GROUPING SETS, CUBE or ROLLUP
If any column appears in GROUP BY but outside the aggregation clauses discussed above. It can be thought of as being first column of the resulting GROUPING SET equivalent. The following examples make this clear.
GROUP BY a, CUBE( b, c) is equivalent to GROUP BY GROUPING SETS ( (a, b, c), (a, b), (a, c), (a) ) GROUP BY a, ROLLUP( b, c) is equivalent to GROUP BY GROUPING SETS ( (a, b, c), (a, b), (a) )
OLAP Reporting using enhanced aggregation features
While the queries on the EMP table are used to illustrate the GROUPING SETS they will be poor examples for discussing the next sections. The reason is that the tables are not in a Star-Schema format. Please run the script (Script A) to get a simple Star Schema.
The tables of the Sample Schema are
Product(prdid, prd_name, prd_family)
TimeByDay(datekey, td_month, td_quarter, td_year)
Location( Loc_id, City, State, Country)
Customer(cust_id, cust_name, cust_type);
Sales(sales_id, cust_id, loc_id, prdid, sales_date, amount);
Product(prdid, prd_name, prd_family)
TimeByDay(datekey, td_month, td_quarter, td_year)
Location( Loc_id, City, State, Country)
Customer(cust_id, cust_name, cust_type);
Sales(sales_id, cust_id, loc_id, prdid, sales_date, amount);
The schema is about a fictitious Art Trader that supplies remakes of statues of famous historical figures (like ALEXANDER, BUDDHA, etc) or landscape paintings of places (like SIKKIM, etc). They sell to museums, resellers or individuals.
The dimensions are Product, TimeByDay, Location and Customer. The fact is Sales.
The hierarchies are -
(1) Product_Name (prd_name) -> Product Family (prd_family)
(2) Date (datekey) -> Month (td_month) -> Quarter (td_quarter) -> Year (td_year)
(3) City -> State -> Country
(4) Customer_Name (cust_name) -> Customer_Type (cust_type)
The hierarchies are -
(1) Product_Name (prd_name) -> Product Family (prd_family)
(2) Date (datekey) -> Month (td_month) -> Quarter (td_quarter) -> Year (td_year)
(3) City -> State -> Country
(4) Customer_Name (cust_name) -> Customer_Type (cust_type)
The two approaches used for generating OLAP reports are as follows -
(1) Get the most generalized possible CUBE built with the dimensions, or
(2) Use on the fly aggregation queries to get the real-time report.
(1) Get the most generalized possible CUBE built with the dimensions, or
(2) Use on the fly aggregation queries to get the real-time report.
Using generalized pre-built CUBE for CUBE, ROLLUP, Drill Down and Slicing Queries.
This approach consists of building a table or a materialized view with the CUBE of the dimensions. Table-3 shows the SQL to build such a cube. The generalized CUBE keeps all possible meaningful aggregation pre-computed. We need to query some of the rows of the CUBE to get the desired values. Since the CUBE stores all possible permutations of the dimensions there is a chance that the number or records in the cube itself might be large. Intelligent use of composite columns might help a great deal here. Note the use of composite columns (City, State) in the CUBE. This is because each state has got only one city with the office of our demo organization.
The GROUPING_ID function helps to achieve the ROLLUP. For example, take the combination (cust_name, cust_type). It is meaningless to make the cube perform aggregations for customer types across customer names. So we include only the bit vectors (1,1), (1, 0) and (0,0) that is GROUPING_ID of 3, 2 and 0 on the customer dimension.
Table 3 - Building a cube
CREATE TABLE sales_cube AS SELECT prd_name, prd_family, datekey, td_month, td_quarter, td_year, cust_name, cust_type, city, state, country, GROUPING_ID (prd_name, prd_family) GID_product, GROUPING_ID (datekey, td_month, td_quarter, td_year) GID_DATE, GROUPING_ID (cust_name, cust_type) GID_CUST, GROUPING_ID (city, state, country) GID_LOC, sum(amount) amount FROM sales, product, timebyday, location, customer WHERE sales.cust_id = customer.cust_id and sales.loc_id = location.loc_id and sales.sales_date = timebyday.datekey and sales.prdid = product.prdid GROUP BY CUBE ( prd_name, prd_family, datekey, td_month, td_quarter, td_year, cust_name, cust_type, (city, state), country ) HAVING ( GROUPING_ID (prd_name, prd_family) = 0 or GROUPING_ID (prd_name, prd_family) = 2 or GROUPING_ID (prd_name, prd_family) = 3) and ( GROUPING_ID (datekey, td_month, td_quarter, td_year) = 0 or GROUPING_ID (datekey, td_month, td_quarter, td_year) = 8 or GROUPING_ID (datekey, td_month, td_quarter, td_year) = 12 or GROUPING_ID (datekey, td_month, td_quarter, td_year) = 14 or GROUPING_ID (datekey, td_month, td_quarter, td_year) = 15) and ( GROUPING_ID (cust_name, cust_type) = 0 or GROUPING_ID (cust_name, cust_type) = 2 or GROUPING_ID (cust_name, cust_type) = 3) and ( GROUPING_ID (city, state, country) = 0 or GROUPING_ID (city, state, country) = 6 or GROUPING_ID (city, state, country) = 7 );
The next table (Table-4) shows a typical crosstab query of sales for Product and Location. It shows the query and also how to generate a crosstab report out of it by using the function CROSSTAB (Script-B). The next examples show the query and cross-tab report and skips the PLSQL portion. The WHERE condition is determined by the bit vectors. We need -
- Details of product and details of customer - Both Product and Customer dimensions are all details. So GID_Product = bit vector (0,0) = 0. Same for GID_Cust.
- Summary of product and details of customer - Product is summarized fully, so GID_Product = bit vector (1,1) = 3.
- Details of product and summary of customer - Customer is summarized fully, GID_Customer = bit vector (1,1) = 3.
- Summary of product and summary of customer - Both Customer and products are summarized.
- Along with any of the above 4 conditions we need full summary or the rest of the dimensions. So GID_date = bit vector (1,1, 1,1) = 15 and GID_Loc = bit vector (1,1,1) = 7.
Table-4 Crosstab Query on Product and Location (Query, Generation Routine and Result)
/*********** The Query ***********/ SELECT prd_name, cust_name, amount FROM sales_cube WHERE ((GID_Product = 0 and GID_Cust = 0) or (GID_Product = 0 and GID_Cust = 3) or (GID_Product = 3 and GID_Cust = 0) or (GID_Product = 3 and GID_Cust = 3)) and GID_date = 15 and GID_LOC = 7; /*********** Generating the crosstab report ********/ set serveroutput on set lines 120 var tempstr varchar2(500) exec :tempstr := ''||- 'SELECT cust_name, prd_name, amount'||chr(10)||- 'FROM sales_cube'||chr(10)||- 'WHERE ((GID_Product = 0 and GID_Cust = 0) or'||chr(10)||- ' (GID_Product = 0 and GID_Cust = 3) or'||chr(10)||- ' (GID_Product = 3 and GID_Cust = 0) or'||chr(10)||- ' (GID_Product = 3 and GID_Cust = 3)) and'||chr(10)||- ' GID_date = 15 and'||chr(10)||- ' GID_LOC = 7'; exec crosstab(:tempstr); *Customers * *---------------------- Products -----------------------------* ************ ALEXANDER BUDDHA CHANDRAGUPTA PURI BEACH SIKKIM --Total-- ART HOUSE 0 0 0 500 750 1250 BARKER 5100 0 0 0 0 5100 JONES 0 0 0 2050 3500 5550 MAHAJATI 0 0 0 1000 0 1000 RATAN 0 5000 0 0 4000 9000 SMITH 9500 9000 900 0 0 19400 STONEWORK 850 800 6000 0 0 7650 --Total-- 15450 14800 6900 3550 8250 48950
Tables Table-5, Table-6 and Table-7 show TimeSales report and drill-down to the quarters of year 2003. Slicing is achieved by including WHERE condition in the query with desired values of the dimensions. Drill down is achieved by selection of proper value of GID_ type columns and by deciding the proper GROUPING value of all the dimensions at the particular level of drill down. Dicing is achieved by merely interchanging the first two columns of SELECT.
Table-5 Year-Product Sales Report: Main (Query and Result)
SELECT prd_name, td_year, amount FROM Sales_cube WHERE ((GID_Product = 0 and GID_date = 14) or (GID_Product = 0 and GID_date = 15) or (GID_Product = 3 and GID_date = 14) or (GID_Product = 3 and GID_date = 15)) and GID_Cust = 3 and GID_Loc = 7; ************ 2002 2003 --Total-- ALEXANDER 5100 10350 15450 BUDDHA 6800 8000 14800 CHANDRAGUPTA 0 6900 6900 PURI BEACH 3550 0 3550 SIKKIM 0 8250 8250 --Total-- 15450 33500 48950
Table-6 Year-Product Sales Report:Drill Down to Quarters and Dicing Product and Time Dimensions (Query and Result)
SELECT td_year||td_quarter, prd_name, amount FROM Sales_cube WHERE ((GID_Product = 0 and GID_date = 12) or (GID_Product = 0 and GID_date = 15) or (GID_Product = 3 and GID_date = 12) or (GID_Product = 3 and GID_date = 15)) and GID_Cust = 3 and GID_Loc = 7; ************ ALEXANDER BUDDHA CHANDRAGUPTA PURI BEACH SIKKIM --Total-- 2002Q1 0 1000 0 2050 0 3050 2002Q2 5100 0 0 500 0 5600 2002Q4 0 5800 0 1000 0 6800 2003Q1 10350 0 6000 0 0 16350 2003Q2 0 0 0 0 750 750 2003Q3 0 8000 0 0 3500 11500 2003Q4 0 0 900 0 4000 4900 --Total-- 15450 14800 6900 3550 8250 48950
Table-6 Year-Product Sales Report:Slice of year 2003, Quarter level drill down (Query and Result)
SELECT td_year||td_quarter, prd_name, amount FROM Sales_cube WHERE ((GID_Product = 0 and GID_date = 12) or (GID_Product = 0 and GID_date = 14) or (GID_Product = 3 and GID_date = 12) or (GID_Product = 3 and GID_date = 14)) and GID_Cust = 3 and GID_Loc = 7 and td_year = 2003; ************ ALEXANDER BUDDHA CHANDRAGUPTA SIKKIM --Total-- 2003 10350 8000 6900 8250 33500 2003Q1 10350 0 6000 0 16350 2003Q2 0 0 0 750 750 2003Q3 0 8000 0 3500 11500 2003Q4 0 0 900 4000 4900
Using on-the-fly aggregation queries for CUBE, ROLLUP, Drill Down and Slicing
While using on-the-fly aggregation queries the cube is not pre-computed and we get the real time summary. However the performance is slower than querying pre-computed cubes. Several features (CUBE, ROLLUP, Composite Columns) discussed here can be used to generate the required aggregation levels. An important thing to ensure during executing on-the-fly queries is that, the query should not perform any useless aggregation. Proper use of the GROUPING functions is important.
Conclusion
Most of the OLAP tools will provide several additional features other than just reporting. There are user-friendly drag and drop interfaces which make drill-down, rollup, slicing, dicing happen on a mouse-click. Report generation and formatting is easier for someone who is not familiar with SQL. There are security features that restrict specific users from drilling down specific sections of the cube or viewing some specific cubes.
When the requirements are just few canned OLAP reports or when simple custom GUI can be made to mask the SQLs, use of the enhanced aggregation features can be really effective. A large portion of the requirements do fall in the second category.
Glossary and Scripts
Script A Script to create Sample Tables for Reporting examples
drop table product; drop table timebyday; drop table location; drop table customer; drop table sales; create table product (prdid number(2), prd_name varchar2(12), prd_family varchar2(10)); create table timebyday (datekey date, td_month char(3), td_quarter char(2), td_year char(4)); create table location (loc_id number(2), city varchar2(10), state varchar2(10), country varchar2(10)); create table customer (cust_id number(2), cust_name varchar2(10), cust_type varchar2(4)); create table sales (sales_id number(6), cust_id number(2), loc_id number(2), prdid number(2), sales_date date, amount number); insert into product values (1,'BUDDHA','STATUE'); insert into product values (2,'ALEXANDER','STATUE'); insert into product values (3,'CHANDRAGUPTA','STATUE'); insert into product values (4,'SIKKIM','PAINTING'); insert into product values (5,'PURI BEACH','PAINTING'); insert into location values (1, 'KOLKATA', 'WB','INDIA'); insert into location values (2, 'MUMBAI','MH','INDIA'); insert into location values (3, 'SYDNEY','SA','AUSTRALIA'); insert into location values (4, 'CHICAGO','IL','USA'); insert into customer values (1, 'SMITH', 'INDV'); insert into customer values (2, 'JONES', 'INDV'); insert into customer values (3, 'BARKER','INDV'); insert into customer values (4, 'ART HOUSE','ORGN'); insert into customer values (5, 'STONEWORK','ORGN'); insert into customer values (6, 'MAHAJATI','ORGN'); insert into customer values (7, 'RATAN','INDV'); insert into sales values (1, 1, 4, 1, '05-JAN-02', 1000); insert into sales values (2, 2, 3, 5, '1-MAR-02', 2050); insert into sales values (3, 3, 4, 2, '14-MAY-02', 5100); insert into sales values (4, 4, 3, 5, '22-JUN-02', 500); insert into sales values (5, 5, 3, 1, '15-OCT-02', 800); insert into sales values (6, 6, 1, 5, '10-NOV-02', 1000); insert into sales values (7, 7, 1, 1, '25-DEC-02', 5000); insert into sales values (8, 1, 4, 2, '18-FEB-03', 9500); insert into sales values (9, 5, 3, 2, '28-FEB-03', 850); insert into sales values (10, 5, 4, 3, '15-MAR-03',6000); insert into sales values (11, 4, 4, 4, '12-JUN-03', 750); insert into sales values (12, 1, 4, 1, '18-JUL-03',8000); insert into sales values (13, 2, 3, 4, '5-AUG-03',3500); insert into sales values (14, 1, 4, 3, '8-NOV-03', 900); insert into sales values (15, 7, 2, 4, '23-NOV-03',4000); insert into timebyday select distinct sales_date, to_char(sales_date,'MON'), 'Q'||to_char(sales_date,'Q'), to_char(sales_date,'YYYY') from sales; commit;
Script B Script to create procedure CROSSTAB
drop table temp_tab1; create table temp_tab1 (col1 varchar2(12), col2 varchar2(12), col3 number); create or replace procedure crosstab(ip_qstr varchar2) is type c1_rec is record (col1 varchar2(14), col2 varchar2(14), col3 number); type c1_tab is table of c1_rec; c1_tab1 c1_tab; varstr1 varchar2(200) := ''; ho_value number; begin dbms_output.enable(1000000); execute immediate('truncate table temp_tab1'); execute immediate ip_qstr bulk collect into c1_tab1; for ii in 1..c1_tab1.LAST loop insert into temp_tab1 values (nvl(c1_tab1(ii).col1,'--Total--'),nvl(c1_tab1(ii).col2,'--Total--'),c1_tab1(ii).col3); end loop; varstr1 := '************ '; for jj in (select distinct nullif(col2,'--Total--') col2 from temp_tab1 order by 1 nulls last) loop varstr1 := varstr1||lpad(nvl(jj.col2,'--Total--'), 12)||' '; end loop; dbms_output.put_line(varstr1); for ii in (select distinct nullif(col1,'--Total--') col1 from temp_tab1 order by 1 nulls last) loop varstr1 := rpad(nvl(ii.col1,'--Total--'),12); for jj in (select distinct nullif(col2,'--Total--') col2 from temp_tab1 order by 1 nulls last) loop begin select col3 into ho_value from temp_tab1 where col1 = nvl(ii.col1,'--Total--') and col2 = nvl(jj.col2,'--Total--'); exception when no_data_found then ho_value := 0; end; varstr1 := varstr1||' '||lpad(ho_value,12); end loop; dbms_output.put_line(varstr1); end loop; return; end crosstab; /
OLAP Terminologies
I am not considering in this article familiarity with OLAP terms as a prerequisite. Readers are advised to skip this section in case they are familiar with OLAP terminologies.
- Data Warehouse Huge Store of historical transaction data from various sources. Gradually keeps on growing as more and more data is loaded into it (history increases). Logical Table structure of a data warehouse is designed keeping in mind the parameters we would like to analyze them with. As such, the logical design has special forms like Star Schema or Snowflake Schema. The physical design is done kept in mind the ease of query. As such warehouse table structures are de-normalized and heavily indexed.
- ETL Stands Extract, Transform and Load. This is the technique to load the huge store of point (1) from the various sources.
- OLAP Engine OLAP stands for Online Analytical Processing, the technique of reporting from the data warehouse.
- Dimension and Measures Take the function F (a, b, c) = (x, y, z) where a, b, c, x, y, and z are several attributes. Function F can be thought of as a business behavior where (x, y, z) are measures of that behavior. The business needs to know how the measure of behavior F, i.e. x, y, z, varies with categories (a, b, c). Probably, with that knowledge the business will try to manipulate (a,b,c) to get the desired (x, y, z), or it might take several other decisions not expressible in such simple terms. (a, b, c) are relatively static well defined categories known as Dimensions. The measures (x, y, z) are also known as Facts.
- Dimension Table and Hierarchies Each dimension has a fixed (or slowly changing) domain of values, which is stored in the dimension table. Each dimension can have aggregate attributes defined at several levels known as hierarchy. Hierarchies of a given dimension are also stored in dimension tables. Let "a" be a dimension, let "a1" be a collection of "a", let "a2" be a collection of "a1". The dimension table can be (a, a1, a2).
- Fact Table The table where dimensions are linked to the measures is known as Fact table. In our example we can take the fact table as (a, b, c, x, y, z). The dimension attributes mentioned in the fact tables are the ones with lowest granularity. So the fact table, in our example, will not have a1 or a2, which are aggregates.
- Cube Cube stores pre-computed aggregations, such as sums and counts, across all dimensions. It is useful in answering multidimensional questions by avoiding frequent summary from the flood of detailed data.
- Drill Down Frequently the business user will start with the highest aggregation of the hierarchy and ask for details on one or some selected dimensions. This is known as drill down.
- Rollup The requirement might also be to start from the lower granularity of a particular dimension and then move to the higher aggregates. This activity is known as rollup. If the requirement is to view the aggregation across all the level of hierarchies of any given dimension, then such a report is also known as rollup report.
- Slice Selecting some values of a dimension based on a filtering condition and then applying aggregation is known as slicing.
- Dice Changing the outlook of the report table, i.e., interchanging the row labels and column labels is known as dicing. It might be mentioned here that dicing does not affect the values of the aggregate columns.
No comments:
Post a Comment