Made by Team Ketts Lough

ETL Data Platform for ToteSys to AWS Data Warehouse

This project involved designing and implementing a data platform to extract, transform, and load (ETL) data from ToteSys (a PostgreSQL database) into a Data Warehouse hosted on AWS. The system runs on a 30-minute schedule and follows these steps: Extraction: A Lambda function extracts data from ToteSys and stores it in the Ingestion S3 bucket. Transformation: Another Lambda function processes the latest data from Ingestion S3, converts it into a star schema, and saves the transformed data in the Processed S3 bucket. Loading: A final Lambda function loads the processed data from Processed S3 into the Data Warehouse.

The Team

Vojtech Nozicka

With a solid foundation in data management, analysis, and…

process development, I have cultivated a strong passion for pursuing a career in the tech industry. Alongside this, I hold a degree in Fine Art, which has shaped my creative and innovative approach to problem-solving and design. My journey led me to Northcoders Bootcamp, where my interest in data engineering blossomed. I’m now excited to combine my analytical skills with my creative background and dive deeper into the world of data and technology, eager to embark on a fulfilling full-time career in this dynamic field.

Matthew Reynolds

Matthew Reynolds, 22, I have a background in History and…

International Relations. Northcoders was my first coding experience, sparked by my interest in social media and emerging technology. Have enjoyed exploring Computer Science and Data Engineering during this Bootcamp and look forward to stepping into an up-and-coming sector.

Marcin Sodel

I love the challenge of turning an empty file into a…

functional program, finding the process both intellectually stimulating and rewarding. With guidance from Northcoders, I have built a solid foundation in programming and feel prepared to take on my first role in the industry.

Hussein Alsakkaf

I am an aspiring data engineer with a strong background in…

engineering and management, holding a Master’s in Oil and Gas Management and a Bachelor’s in Chemical Engineering. Passionate about transforming raw data into actionable insights, I’ve recently completed an intensive bootcamp, gaining hands-on experience in Python, SQL, and cloud technologies. I’m excited to tackle new challenges and contribute to innovative projects in the data engineering field.

Yanrong Zhang

No bio provided

Tech Stack

Tech Stack & Key Components: Python: v3.12 AWS Services: – S3: Storage for raw and transformed data. – Lambda: Serverless compute for ETL tasks. – CloudWatch: Monitoring and logging. – EventBridge: Triggers for scheduled execution. – Step Functions: Workflow orchestration. – IAM: Manages permissions for various services – Secrets Manager: Securely stores credentials Pandas: Used for data transformation. Data Formats: JSON (raw ingestion) and Parquet (processed storage). Infrastructure as Code (IaC): Terraform for provisioning AWS resources. CI/CD: Automated deployment and testing. This system ensures efficient and reliable data movement from ToteSys to the Data Warehouse, supporting analytics and reporting needs.

Challenges Faced

Aside from the expected challenges like limited time, handling large datasets and its continuous updating or managing timestamps to filter data appropriately, one of the more unexpected ones was handling the date dimension table. Our initial approach was to dynamically create dates, but due to the long processing time required, we decided to use a static date range to improve the performance. This allowed us to ensure that the Lambda only created the date dimension once and did not recreate it if it already existed.

Student Projects – Team Ketts Lough Final Project