Monday, 18 August 2014

Staging


Source systems are only available for extraction during a specific time slot which is generally lesser than your overall data loading time. It’s a good idea to extract and keep things at your end before you lose the connection to the source systems.
You want to extract data based on some conditions which require you to join two or more different systems together. E.g. you want to only extract those customers who also exist in some other system. You will not be able to perform a SQL query joining two tables from two physically different databases.
Various source systems have different allotted timing for data extraction.
Data warehouse’s data loading frequency does not match with the refresh frequencies of the source systems
Extracted data from the same set of source systems are going to be used in multiple places (data warehouse loading, ODS loading, third-party applications etc.)
ETL process involves complex data transformations that require extra space to temporarily stage the data
There is specific data reconciliation / debugging requirement which warrants the use of staging area for pre, during or post load data validations

No comments:

Post a Comment