Skip to content

Student Projects -Pipeline Pioneers Data Project

Made by Pipeline Pioneers

Overcoming errors with the power of friendship

The **Totesys ETL Pipeline** is a data engineering solution that extracts, transforms, and loads data into an OLAP data warehouse for analytical purposes. The project incorporates AWS services to build a robust, automated pipeline and provides insights through **Tableau** dashboards. **Data Ingestion**: Extracts raw data from the Totesys database and ingests it into an AWS S3 ingestion bucket. – **Data Transformation**: Processes raw data into a structured schema suitable for the data warehouse. – **Data Loading**: Loads transformed data into fact and dimension tables in the data warehouse. – **Automation**: Event-driven architecture that triggers processes using AWS Lambda and S3 events. – **Monitoring and Logging**: AWS CloudWatch monitors the pipeline for operational visibility. – **Visualization**: Tableau provides interactive dashboards to analyze the data. **S3**: Ingestion and processed buckets. – **Lambda**: Python-based ETL scripts for data processing. – **CloudWatch**: Monitoring and logging. – **QuickSight**: BI tool for creating dashboards. **Ingestion**: – Data is extracted from the Totesys database and placed in the S3 ingestion bucket. – Find the file in src/extract_lambda directory – **Trigger**: Manual or scheduled job. 2. **Transformation**: – AWS Lambda processes data upon ingestion and transforms it into the defined schema. – Processed data is stored in Parquet format in the S3 processed bucket. – Find the file in src/transform_lambda directory 3. **Loading**: – Transformed data is loaded into a prepared data warehouse at defined intervals. – **Trigger**: Event-driven or scheduled Lambda. 4. **Visualization**: – Tableau to generate dashboards.

The Team

Simon Kinder

Simon Kinder

No bio provided

Macshellah Zisengwe

Macshellah Zisengwe

No bio provided

Louise Concepcion

Louise Concepcion

No bio provided

Jeremy Lam

Jeremy Lam

No bio provided

Abbey Ola

Abbey Ola

No bio provided

Tech Stack

Tech Stack for this group

We used: Terraform, Python, Pandas, Pytest, AWS, Boto3, Moto, Tableau, PG8000, PostgresSQL We had experience using them during the bootcamp, and we were confident in them.

Challenges Faced

Yes, many