Data Warehouse Interview Questions
Q: What is a Data warehouse?
A: A Data warehouse is a repository of integrated information, available for queries and analysis.
Data and information are extracted from heterogeneous sources and stored in a database for easy and more efficient way to run queries and create reports.
Data and information are extracted from heterogeneous sources and stored in a database for easy and more efficient way to run queries and create reports.
A data warehouse is a logical collection of information gathered from many different operational databases used to create business intelligence that supports business analysis activities and decision-making tasks, primarily, a record of an enterprise’s past transactional and operational information, stored in a database designed to favour efficient data analysis and reporting (especially OLAP)”.
Q: What is data mining?
A: Data mining is a process of extracting hidden trends within a datawarehouse. For example an insurance data warehouse can be used to mine data for the most high risk people to insure in a certain geographial area.
Q: What are Data Marts?
A: Data Marts are subset of the corporate-wide data that is of value to a specific group of users.
There are two types of Data Marts:
1.Independent data marts – sources from data captured form OLTP system, external providers or from data generated locally within a particular department or geographic area.
2.Dependent data mart – sources directly from enterprise data warehouses.
There are two types of Data Marts:
1.Independent data marts – sources from data captured form OLTP system, external providers or from data generated locally within a particular department or geographic area.
2.Dependent data mart – sources directly from enterprise data warehouses.
Q: What is OLTP?
A: OnLine Transactional Processing.
Q: What is OLAP?
A: OnLine Analatical Processing.
Q: What are the differences between OLTP and OLAP?
A: Main Differences between OLTP and OLAP are:-
1. User and System Orientation
OLTP: customer-oriented, used for data analysis and querying by clerks, clients and IT professionals.
OLAP: market-oriented, used for data analysis by knowledge workers( managers, executives, analysis).
2. Data Contents
OLTP: manages current data, very detail-oriented.
OLAP: manages large amounts of historical data, provides facilities for summarization and aggregation, stores information at different levels of granularity to support decision making process.
3. Database Design
OLTP: adopts an entity relationship(ER) model and an application-oriented database design.
OLAP: adopts star, snowflake or fact constellation model and a subject-oriented database design.
4. View
OLTP: focuses on the current data within an enterprise or department.
OLAP: spans multiple versions of a database schema due to the evolutionary process of an organization; integrates information from many organizational locations and data stores
1. User and System Orientation
OLTP: customer-oriented, used for data analysis and querying by clerks, clients and IT professionals.
OLAP: market-oriented, used for data analysis by knowledge workers( managers, executives, analysis).
2. Data Contents
OLTP: manages current data, very detail-oriented.
OLAP: manages large amounts of historical data, provides facilities for summarization and aggregation, stores information at different levels of granularity to support decision making process.
3. Database Design
OLTP: adopts an entity relationship(ER) model and an application-oriented database design.
OLAP: adopts star, snowflake or fact constellation model and a subject-oriented database design.
4. View
OLTP: focuses on the current data within an enterprise or department.
OLAP: spans multiple versions of a database schema due to the evolutionary process of an organization; integrates information from many organizational locations and data stores
Q: What is real time data-warehousing?
A: Real-time data warehousing captures business activity data as it occurs. As soon as the business activity is complete and there is data about it, the completed activity data flows into the data warehouse and becomes available instantly. In other words, real-time data warehousing is a framework for deriving information from data as the data becomes available.
Q: What are the steps to build the datawarehouse ?
A:
Q: What is a CUBE in data warehousing concept?
A: Cubes are logical representation of multidimensional data.The edge of the cube contains dimension members and the body of the cube contains data values.
Q: What is a linked cube?
A: Linked cube is a cube, in which a sub-set of the data can be analysed into greater detail. The linking ensures that the data in the cubes remain consistent.
Q: What is the main difference between Inmon and Kimball philosophies of data warehousing?
A: Both differed in the concept of building the datawarehosue.
Kimball views data warehousing as a constituency of Data marts. Data marts are focused on delivering business objectives for departments in the organization. And the data warehouse is a conformed dimension of the data marts. Hence a unified view of the enterprise can be obtain from the dimension modeling on a local departmental level.
Kimball views data warehousing as a constituency of Data marts. Data marts are focused on delivering business objectives for departments in the organization. And the data warehouse is a conformed dimension of the data marts. Hence a unified view of the enterprise can be obtain from the dimension modeling on a local departmental level.
Inmon beliefs in creating a data warehouse on a subject-by-subject area basis. Hence the development of the data warehouse can start with data from the online store. Other subject areas can be added to the data warehouse as their needs arise.
i.e.,
Kimball–First DataMarts–Combined way —Datawarehouse
Inmon—First Datawarehouse–Later—-Datamarts
Kimball–First DataMarts–Combined way —Datawarehouse
Inmon—First Datawarehouse–Later—-Datamarts
Q: What is Hierarchy in data warehouse terms?
A: Hierarchies are logical structures that use ordered levels as a means of organizing data. A hierarchy can be used to define data aggregation. For example, in a time dimension, a hierarchy might aggregate data from the month level to the quarter level to the year level. A hierarchy can also be used to define a navigational drill path and to establish a family structure.
Within a hierarchy, each level is logically connected to the levels above and below it. Data values at lower levels aggregate into the data values at higher levels. A dimension can be composed of more than one hierarchy. For example, in the product dimension, there might be two hierarchies–one for product categories and one for product suppliers.
Dimension hierarchies also group levels from general to granular. Query tools use hierarchies to enable you to drill down into your data to view different levels of granularity. This is one of the key benefits of a data warehouse.
When designing hierarchies, you must consider the relationships in business structures. For example, a divisional multilevel sales organization.
Hierarchies impose a family structure on dimension values. For a particular level value, a value at the next higher level is its parent, and values at the next lower level are its children. These familial relationships enable analysts to access data quickly.
A: Hierarchies are logical structures that use ordered levels as a means of organizing data. A hierarchy can be used to define data aggregation. For example, in a time dimension, a hierarchy might aggregate data from the month level to the quarter level to the year level. A hierarchy can also be used to define a navigational drill path and to establish a family structure.
Within a hierarchy, each level is logically connected to the levels above and below it. Data values at lower levels aggregate into the data values at higher levels. A dimension can be composed of more than one hierarchy. For example, in the product dimension, there might be two hierarchies–one for product categories and one for product suppliers.
Dimension hierarchies also group levels from general to granular. Query tools use hierarchies to enable you to drill down into your data to view different levels of granularity. This is one of the key benefits of a data warehouse.
When designing hierarchies, you must consider the relationships in business structures. For example, a divisional multilevel sales organization.
Hierarchies impose a family structure on dimension values. For a particular level value, a value at the next higher level is its parent, and values at the next lower level are its children. These familial relationships enable analysts to access data quickly.
Q: What are the differnces between a RDBMS schema and a data warehouse schema?
A:
RDBMS Schema
* Used for OLTP systems
* Highly Normalized
* Difficult to understand and navigate
* Difficult to extract and solve complex problems
A:
RDBMS Schema
* Used for OLTP systems
* Highly Normalized
* Difficult to understand and navigate
* Difficult to extract and solve complex problems
DWH Schema
* Used for OLAP systems
* De-normalized
* Easy to understand and navigate
* Relatively easier in extracting the data and solving complex problems
* Used for OLAP systems
* De-normalized
* Easy to understand and navigate
* Relatively easier in extracting the data and solving complex problems
Q:What is meant by metadata in the context of a Data warehouse?
A: Meta data is the data about data; Business Analyst or data modeler usually capture information about data – the source (where and how the data is originated), nature of data (char, varchar, nullable, existance, valid values etc) and behavior of data (how it is modified / derived and the life cycle ) in data dictionary a.k.a metadata. Metadata is also presented at the Datamart level, subsets, fact and dimensions, ODS etc. For a DW user, metadata provides vital information for analysis / DSS.
Q. What are the commonly used indexes in Data warehouse systems?
A. B-Tree and Bit Map indexes.
Q. When Bit Map indexes are used?
A. The main advantage of using bitmap index is for the columns in which the ratio of the number of distinct values to the number of rows in the table is very low. This ratio referred as degree of cardinality. Example: A gender column, which has only two distinct values (male and female), which is ideal for a bitmap index.
Q. Which scenarios are better suited for using B-Tree indexes?
A. B-tree indexes can be used when a typical query refers to the indexed column and retrieves a few rows. In these queries, it is faster to find the rows by looking at the index.
What is a dimension table?
A dimensional table is a collection of hierarchies and categories along which the user can drill down and drill up. it contains only the textual attributes.
What is Fact table?
Fact Table contains the measurements or metrics or facts of business process.
If your business process is “Sales” , then a measurement of this business process such as “monthly sales number” is captured in the Fact table.
Fact table also contains the foriegn keys for the dimension tables.
What is conformed fact?
Conformed dimensions are the dimensions which can be used across multiple Data Marts in combination with multiple facts tables accordingly
What is ODS?
1. ODS means Operational Data Store.
2. A collection of operation or bases data that is extracted from operation databases and standardized, cleansed, consolidated, transformed, and loaded into an enterprise data architecture. An ODS is used to support data mining of operational data, or as the store for base data that is summarized for a data warehouse. The ODS may also be used to audit the data warehouse to assure summarized and derived data is calculated properly. The ODS may further become the enterprise shared operational database, allowing operational systems that are being reengineered to use the ODS as there operation databases.
What is a lookup table?
A lookUp table is the one which is used when updating a warehouse. When the lookup is placed on the target table (fact table / warehouse) based upon the primary key of the target, it just updates the table by allowing only new records or updated records based on the lookup condition.
What are Aggregate tables?
Aggregate table contains the summary of existing warehouse data which is grouped to certain levels of dimensions.Retrieving the required data from the actual table, which have millions of records will take more time and also affects the server performance.To avoid this we can aggregate the table to certain required level and can use it.This tables reduces the load in the database server and increases the performance of the query and can retrieve the result very fastly.
What are conformed dimensions?
Answer1:
Conformed dimensions mean the exact same thing with every possible fact table to which they are joined Ex:Date Dimensions is connected all facts like Sales facts,Inventory facts..etc
Answer2:
Conformed dimentions are dimensions which are common to the cubes.(cubes are the schemas contains facts and dimension tables)
Consider Cube-1 contains F1,D1,D2,D3 and Cube-2 contains F2,D1,D2,D4 are the Facts and Dimensions here D1,D2 are the Conformed Dimensions
What is a level of Granularity of a fact table?
Level of granularity means level of detail that you put into the fact table in a data warehouse. For example: Based on design you can decide to put the sales data in each transaction. Now, level of granularity would mean what detail are you willing to put for each transactional fact. Product sales with respect to each minute or you want to aggregate it upto minute and put that data.
How are the Dimension tables designed?
Most dimension tables are designed using Normalization principles upto 2NF. In some instances they are further normalized to 3NF.
Find where data for this dimension are located.
Figure out how to extract this data.
Determine how to maintain changes to this dimension (see more on this in the next section).
What are non-additive facts?
Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.
What are slowly changing dimensions?
SCD stands for Slowly changing dimensions. Slowly changing dimensions are of three types
SCD1: only maintained updated values.
Ex: a customer address modified we update existing record with new address.
SCD2: maintaining historical information and current information by using
A) Effective Date
B) Versions
C) Flags
or combination of these
scd3: by adding new columns to target table we maintain historical information and current information
What are Semi-additive and factless facts and in which scenario will you use such kinds of fact tables?
Snapshot facts are semi-additive, while we maintain aggregated facts we go for semi-additive.
EX: Average daily balance
A fact table without numeric fact columns is called factless fact table.
Ex: Promotion Facts
While maintain the promotion values of the transaction (ex: product samples) because this table doesn’t contain any measures.
Why fact table is in normal form?
Basically the fact table consists of the Index keys of the dimension/ook up tables and the measures.
so when ever we have the keys in a table .that itself implies that the table is in the normal form.
What is degenerate dimension table?
Degenerate Dimensions : If a table contains the values, which are neither dimesion nor measures is called degenerate dimensions.Ex : invoice id,empno
what is junk dimension? what is the difference between junk dimension and degenerated dimension?
Junk dimension: Grouping of Random flags and text Attributes in a dimension and moving them to a separate sub dimension.
Degenerate Dimension: Keeping the control information on Fact table ex: Consider a Dimension table with fields like order number and order line number and have 1:1 relationship with Fact table, In this case this dimension is removed and the order information will be directly stored in a Fact table inorder eliminate unneccessary joins while retrieving order information..
4.what is aggregate fact table?
Aggregate table contains the [measure] values ,aggregated /grouped/summed up to some level of hirarchy.
What is fact less fact table? where you have used it in your project?
Factless table means only the key available in the Fact there is no mesures availalabl
Q.What is a dimension table?
Dimension tables contain attributes that describe fact records in the fact table. Dimension tables contain hierarchies of attributes that aid in summarization. Example: Time Dimension, Product Dimension
Q.What is Fact table?
A. Fact Table contains the measurements or metrics or facts of business process.
Q. What are conformed dimensions?
A. Dimension tables contain attributes that describe fact records in the fact table. Dimension tables contain hierarchies of attributes that aid in summarization. Example: Time Dimension.
Q. What is ODS?
A. ODS means Operational Data Store. A collection of operation or bases data that is extracted from operation databases and standardized, cleansed, consolidated, transformed, and loaded into an enterprise data architecture. An ODS is used to support data mining of operational data, or as the store for base data that is summarized for a data warehouse.
Q. What is a lookup table?
A. A lookup table is used to display information from one table based on the value of a foreign-key field in another table.
Q. What are Aggregate tables?
A. Aggregate table contains the summary of existing warehouse data which is grouped to certain levels of dimensions. Aggregate table gives better performance for many reporting needs. Aggregate table contains the [measure] values, aggregated/grouped/summed up to some level of hierarchy.
Q.What is a level of Granularity of a fact table?
A. Level of granularity means level of detail that you put into the fact table in a data warehouse. For example: Based on design you can decide to put the sales data in each transaction. Now, level of granularity would mean what detail are you willing to put for each transactional fact. Product sales with respect to each minute or you want to aggregate it upto minute and put that data.
Q. How are the Dimension tables designed?
A. Most dimension tables are designed using Normalization principles upto 2NF. In some instances they are further normalized to 3NF.
Q. What are non-additive facts?
A. Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.
Q. What are slowly changing dimensions?
SCD stands for Slowly changing dimensions. Slowly changing dimensions are of three types
SCD1: Only maintain updated/Current values.
Ex: a customer address modified we update existing record with new address.
SCD2: Maintain historical and current information by using A) Effective Date B) Versions C) Flags or combination of all three of those.
SCD3: By adding new columns to target table to maintain both historical and current information
Q. What are Semi-additive and factless facts and in which scenario will you use such kinds of fact tables?
A. Generally most of the Snapshot fact tables are semi-additive.
A fact table without numeric fact columns is called factless fact table.
Q. What is degenerate dimension table?
A. If a table contains the values, which are neither dimesion nor measures is called degenerate dimensions.Ex : invoice id,EmpId.
Q.What is a junk dimension? what is the difference between junk dimension and degenerated dimension?
A. Junk dimension: Grouping of Random flags and text Attributes in a dimension and moving them to a separate sub dimension.
Degenerate Dimension: Keeping the control information on Fact table ex: Consider a Dimension table with fields like order number and order line number and have 1:1 relationship with Fact table, In this case this dimension is removed and the order information will be directly stored in a Fact table inorder eliminate unneccessary joins while retrieving order information.
Q.What is a fact less fact table?
A. Factless table means only the keys are available in the Fact while there are no measures available.
No comments:
Post a Comment