Source systems are only available for extraction during
a specific time slot which is generally lesser than your overall data loading
time. It’s a good idea to extract and keep things at your end before you lose
the connection to the source systems.
You want to extract data based on some conditions which
require you to join two or more different systems together. E.g. you want to
only extract those customers who also exist in some other system. You will not
be able to perform a SQL query joining two tables from two physically different
databases.
Various source systems have different allotted timing
for data extraction.
Data warehouse’s data loading frequency does not match
with the refresh frequencies of the source systems
Extracted data from the same set of source systems are
going to be used in multiple places (data warehouse loading, ODS loading,
third-party applications etc.)
ETL process involves complex data transformations that
require extra space to temporarily stage the data
There is specific data reconciliation / debugging
requirement which warrants the use of staging area for pre, during or post load
data validations
No comments:
Post a Comment