Extract can be grouped into static extract and incremental extract. Static extract captures a snapshot of the source data at a point in time whereas incremental extract captures changes that have occurred since the last static extract.
Transform has three major steps i.e. data cleansing, data integration and other transformations ( such as derived values, replacement values and other aggregation calculations).
Loading: At this stage the transformed data is loaded into the data warehouse and index will be created. And loading can be at a refresh mode or update mode.
ETL is a short for Extract, Transform and Load.
It is a set of processes for Extraction, transforming and loading of data from various data sources such as OLTP databases, social media websites, flat files, spreadsheets, Access , excel etc.. into a Data warehouse.
The main ETL steps can be grouped into three sections.
- Extract: it is the first process that the tool connects to various data sources, extract the data from these data sources and then make the extracted data available to the next subsequent processing steps.
- Transform: Once the data is extracted, functions are applied to the extracted data. The functions enables the tool to integrate data ,validate data, transform data , create derived values from the extracted data through calculation and finally make it ready to load to target databases.
- Load: It is the final process where the transformed data is loaded to the target data warehouse system.
And some of the open source and commercial ETL tools include but not limited to :
- Oracle Data warehouse builder
- Oracle data integration(ODI)
- Pentaho data integration (Kettle)
- Talend
- SQL server integration service
- Informatica powercenter