We know that data is the business asset for any
organisation which always keeps secure and accessible to business users
whenever it required.
In current era, two techniques are very popular to store
the data for the business insights. Hence, we are going to differentiate them
based on some technical terms.
One is Data
Warehouse which is highly structured store of the data that is requiring a
significant amount of discovery, planning, data modeling, and development work
before the data becomes available for analysis by the business users.
Second one is a Data Lake which is a storage repository that holds a vast amount of
raw data in its native format, including structured, semi-structured, and
unstructured data. The data structure and requirements are not defined until
the data is needed. We can say that Data Lake is a more organic store of data
without regard for the perceived value or structure of the data.
Data lakes are a big opportunity to store large amounts of data in an affordable way without having to decide upfront how it must be structured and used. They are typically used to complement traditional data warehouses, which are still better adapted for highly-trusted, tightly-governed data such as your financial figures, but there are some overlaps between the two compositories.
Data lakes are a big opportunity to store large amounts of data in an affordable way without having to decide upfront how it must be structured and used. They are typically used to complement traditional data warehouses, which are still better adapted for highly-trusted, tightly-governed data such as your financial figures, but there are some overlaps between the two compositories.
Data
Warehouses compared to Data Lakes - Depending on the business requirements, a
typical organization will require both a data warehouse and a data lake as they serve different needs, and use cases.
During the development of a traditional data
warehouse, we should decide a considerable amount of time which is going to
spend analyzing data sources, understanding business processes, profiling data,
and modeling data.
Characteristics
|
Data Warehouse
|
Data Lake
|
Type of
data stored
|
Structured data (most often in columns & rows in a
relational database) from transactional systems, operational databases, and
line of business applications
|
Any type of data structure,
any format, including structured, semi-structured, and unstructured data from IoT devices, web sites, mobile apps, social media, and corporate applications |
Best
way to ingest data
|
Batch processes
|
Streaming, micro-batch, or
batch processes |
Schema
|
Designed prior to the DW implementation (schema-on-write)
|
define the structure of the data at the time of analysis ,
referred to as schema on reading (schema-on-read)
|
Typical
load pattern
|
ETL - (Extract, Transform, then Load)
|
ELT - (Extract, Load, and Transform at the time the data is loaded)
|
Price/Performance
|
Fastest query results using higher cost storage
|
Query results getting faster using low-cost storage
|
Data
Quality
|
Highly curated data that serves as the central version of the truth
|
Any data that may or may not be curated (ie. raw data)
|
Users
|
Business analysts
|
Data scientists, Data developers, and Business analysts (using
curated data)
|
Analytics
pattern
|
Determine structure, acquire data, then analyze it; iterate back
to change structure as needed.
Batch reporting, BI and visualizations
|
Acquire data, analyze it, then iterate to determine its final
structured form.
Machine Learning, Predictive analytics, data discovery and
profiling
|
In contrast,
the default
expectation for a data lake is to acquire all of the data and retain all of the
data.
Please visit us to learn more on -
- Collaboration of OLTP and OLAP systems.
- Major differences between OLTP and OLAP.
- Data Warehouse - Introduction
- Data Warehouse - Multidimensional Cube
- Data Warehouse - Multidimensional Cube Types
- Data Warehouse - Architecture and Multidimensional Model.
- Data Warehouse - Dimension tables.
- Data Warehouse - Fact tables.
- Data Warehouse - Conceptual Modeling.
- Data Warehouse - Star schema.
- Data Warehouse - Snowflake schema.
- Data Warehouse - Fact constellations.
- Data Warehouse - OLAP Servers.
- Preparation for a successful Data Lake in the cloud
- Why does cloud make Data Lakes Better?
Good one, would have been better if the page was responsive..viz., readable on mobile
ReplyDelete