Made by Pipeline Pioneers
Overcoming errors with the power of friendship
The **Totesys ETL Pipeline** is a data engineering solution that extracts, transforms, and loads data into an OLAP data warehouse for analytical purposes. The project incorporates AWS services to build a robust, automated pipeline and provides insights through **Tableau** dashboards. **Data Ingestion**: Extracts raw data from the Totesys database and ingests it into an AWS S3 ingestion bucket. – **Data Transformation**: Processes raw data into a structured schema suitable for the data warehouse. – **Data Loading**: Loads transformed data into fact and dimension tables in the data warehouse. – **Automation**: Event-driven architecture that triggers processes using AWS Lambda and S3 events. – **Monitoring and Logging**: AWS CloudWatch monitors the pipeline for operational visibility. – **Visualization**: Tableau provides interactive dashboards to analyze the data. **S3**: Ingestion and processed buckets. – **Lambda**: Python-based ETL scripts for data processing. – **CloudWatch**: Monitoring and logging. – **QuickSight**: BI tool for creating dashboards. **Ingestion**: – Data is extracted from the Totesys database and placed in the S3 ingestion bucket. – Find the file in src/extract_lambda directory – **Trigger**: Manual or scheduled job. 2. **Transformation**: – AWS Lambda processes data upon ingestion and transforms it into the defined schema. – Processed data is stored in Parquet format in the S3 processed bucket. – Find the file in src/transform_lambda directory 3. **Loading**: – Transformed data is loaded into a prepared data warehouse at defined intervals. – **Trigger**: Event-driven or scheduled Lambda. 4. **Visualization**: – Tableau to generate dashboards.
The Team
Tech Stack

We used: Terraform, Python, Pandas, Pytest, AWS, Boto3, Moto, Tableau, PG8000, PostgresSQL We had experience using them during the bootcamp, and we were confident in them.
Challenges Faced
Yes, many