Data is the business asset for every
organisation which is audited and protected. Now days, every organisation want
to create their own data warehouse to store their business data in a perfect
manner to utilise for decision support.
These data warehouse contain massive amounts of
highly detailed, time-series data used for decision support. Data warehouses
often contain terabytes of data that can be readily queried by end users. To
gain in their business, it is become very urgent for every organization to face
a range of choices where it requires an understanding of the two main data
warehousing models— Inmon’s and Kimball’s, both in terms software tools and
development approaches.
We know that ETL is essential to the achievability of the data warehouse in
that it challenges to ensure data integrity within the data warehouse.
Inmon vs. Kimball Data
Models
To understand, how these two models are similar and how they
differ gives us a foundational knowledge of the most basic data warehouse
concepts.
At a Glance
Bill
Inmon advocates a top-down development approach that adapts
traditional relational database tools to the development needs of an enterprise
wide data warehouse where as
Ralph
Kimball recommends a bottom-up approach that uses dimensional modeling,
a data modeling approach unique to data warehousing. Rather than building a
single enterprise-wide database, Kimball suggests creating one database (or
data mart) per major business process.
|
We know that most of the data in the data warehouse is extracts from
operational databases, then summarises, reconciles, and manipulates by the help
of ETLs tools. Then the data is ready to be stored in carefully designed
relational database tables in the data warehouse. In the data warehousing move
toward, information is requested, processed, and merged continuously, so the
information is readily available for direct querying OLAP and analysis at the
warehouse.
For any organisation, understanding the basics of the
architecture and methodology of both models provides a good foundational
knowledge of data warehousing. It’s very important factor in term of the
project cost and maintenance prospective where programmer/developer/data
scientist can build situation-specific knowledge that is appropriate to their
organization’s needs.
Bill
Inmon Model
|
Ralph
Kimball Model
|
It consists of all information systems and their databases
throughout a given organization. It is inspire from the relational data model
(OLTP) and follow the same scenarios to develop relational architecture for
the data warehouse (3NF).
|
Kimball model is based on a data modeling method (Dimensional
Data Modeling) unique to the data warehouse. His architecture is also known
as Data Warehouse Bus. Dimensional modeling focuses on ease of end user
accessibility and provides a high level of performance to the data warehouse.
|
It divides the overall database environment of the
organization into four levels which are given below:
Operational
Atomic data warehouse
Departmental
Individual
|
It involves a bottom-up approach, which in the case of data
warehouses means to build one data mart at a time. The four steps of the
dimensional design process are:
Select the business process
Declare the grain
Choose the dimensions
Identify the facts
|
From the operational systems, data is extensively manipulated
and then moved to the atomic data warehouse with the help of various ETLs
tools.
|
Data is copied from operational source systems to a staging
area. In the staging area, the data is scrubbed and suitable for end-user
queries. From the staging area, data is loaded into data marts. The data
marts are the source of data for user queries.
|
Inmon suggests using enterprise-wide data models if possible
to save development time. It proposes three levels of data modeling to support
a spiral methodology, in that all user views are consistent with the
corporate model.
ERD Level (Entity Relationship
Diagrams) is used to explore and refine entities, their attributes, and the
relationships among the entities.
The second (Mid-Level) data model, establishes
the DIS (Data Item set) for each department.
An ERD created in the first-level data model is the basis for
a DIS in the second-level data model.
The final physical
model is created from the mid-level data model merely by extending the
mid-level data model to include keys and physical characteristics of the
model.
|
A database engine can make very strong assumptions about first
constraining the heavily indexed dimension tables, and then attacking the
fact table all at once with the Cartesian product of the dimension table keys
satisfying the user’s constraints.
Dimensional modeling begins with tables rather than
entity-attribute data models such as ERDs. These tables are either fact
tables or dimension tables.
Fact tables contain metrics, while dimension tables contain
attributes of the metrics in the fact tables. Dimension tables routinely
contain repeating groups; this violates normalization rules. In contrast, the
dimension tables are likely to have only hundreds or thousands of rows, and
be only megabytes in total size. These tables contain all the attributes of
the data in the fact table in highly denormalized forms.
|
It explains various techniques for optimizing the performance
of the data warehouse at both the atomic and departmental levels.
|
Dimensional modeling is a data modeling approach that
capitalizes on the unique requirements of the data warehouse.
|
Once the three-level data model is complete, the data
warehouse development has started.
|
Keeping fact tables to a small number of rows and allowing
dimension tables to be highly denormalized are both essential. The resulting
data mart is highly accessible to the end user and provides reasonable query
response times.
|
Granularity is a measure of the detail of the data. If the
volume of data is massive, then the development team needs to consider
multiple levels of granularity for the data.
|
Declaring the grain is the process of deciding what level of
detail the data warehouse will contain. The lowest level of granularity is
called atomic, means it cannot be further sub-divided.
|
When there is enough information to do so, the development
team conducts a technical assessment to ensure that the data in the warehouse
will be accessible and well managed.
|
The data model bus stated that all data marts must use
standardized conformed dimensions. The basic requirements of conformed
dimensions are that keys, column names, attribute definitions, and attribute
values are consistent across business processes.
|
One of the salient aspects of a data-driven methodology is that
it builds on previous efforts and utilizing both code and processes that have
already been developed.
|
Each data mart is based on a single business process. The data
warehouse bus allows the sum of the data marts to truly be an integrated
whole—a data warehouse.
|
In banking, the Customer table generates a primary grouping of
data such as account (primary grouping). An Account may have several
manifestations, such as loan, savings, or trust (secondary grouping). Relations
show that one customer may have several different accounts.
|
In the retail example, Kimball chooses to include some
computed values as well as truly atomic values, making queries easy for the
end user and providing acceptable data warehouse performance.
|
The most prominent
similarities between Inmon’s and Kimball’s models are the use of time-stamped
data, and the extract, transform, and load (ETL) process. Although the
execution of these two elements differs between the two models, the data
attributes and query results are very similar.
Conclusion
Inmon’s evolutionary approach grows out of operational relational database technology and development methods. Inmon explains various techniques for optimizing the performance of the data warehouse at both the atomic and departmental levels. Kimball defines business processes quite broadly. To choose the first business process for the data warehouse project and declaring the grain is the process of deciding what level of detail the data warehouse will contain.
No comments:
Post a Comment