Made by Team Southport Pier

It’s more than a bag, it’s a feature!

This is an end-to-end ETL pipeline for a tote bag business. It pulls data from their database into a data warehouse for future analysis. In this projet, three lambda applications were created using psycopg2 and boto3. They were deployed in the AWS cloud storage services s3 using terraform and github actions CI/CD. Lambda 1: This lambda function handles the extraction. It connects to the database using psycopg2, runs on a schedule, monitors for changes, and pushes data into an s3 bucket in csv format. It logs to cloudwatch and sends out failure alerts via email. Lambda 2: The transformation step cleans and reshapes the data into predefined schemas for warehousing. It uses pandas and boto3 and stores the cleaned output as parquet files into a separate s3 bucket. It logs to cloudwatch and sends out failure alerts via email. Lambda 3: The load lambda takes the processed data from the last s3 bucket and loads it into the data warehouse for future analysis. It logs to cloudwatch and sends out failure alerts via email. Built With: – Python – Terraform – PostgreSQL – SQLAlchemy

The Team

Saffron Morton

I’m Saffi and my background was in data analysis for…

behavioural research, hospitality, and support work for those with brain injuries and learning disabilities. I’m strongly interested in combining analytical thinking with practical application, and love to sharpen my cirtical thinking by solving puzzles. Throughout my life and career, I have developed empathy, effective communication, and a knack for tailoring solutions. Now, driven by a love for continuous learning, I’m excited to apply my skills in data engineering!

Patricia Plaza Rojas

Also known as Selva. Passionate biologist and Team Leader…

Scientist currently transitioning into data engineering. I’m diving deep into Python, SQL, cloud architecture (AWS), and modern data workflows. With a strong foundation in scientific problem-solving, I’m now building skills in data pipelines, infrastructure as code, and warehouse design. Excited about turning complex data into insight — and always up for learning something new!

George Krokos

I’m George, came into this bootcamp from a finance…

background wanting to learn more about how data engineering works and learn to code in Python and SQL. It has been a beautiful experience and I look forward to what the future brings!!!

Abigail Adjei

Hi I’m Abby. I come from an admin background in hospitality…

and I wanted to change careers so I joined the bootcamp. Coding is a pleasure of mine and this bootcamp has taught me a lot in terms of cloud infrastructure, pipelines, RESTful APIs and so much more. I look forward to what this experience is going to bring. Fun fact about me is that I enjoy aerial yoga and I am currently training to run both a 5k and 10k next year.

Dale Barnes

As a student at Northcoders, I am currently pursuing a…

13-week Cloud/Data Engineering Bootcamp based in Manchester. I have a strong background in Mechanical Engineering with a Master’s degree from The University of Manchester, where I studied in Toulouse, France.I am passionate about learning new technologies and applying them to solve real-world problems. I am eager to leverage my diverse experience and skills to contribute to innovative projects in the field of cloud and data engineering.

Tech Stack

We used Python, Terraform, PostgreSQL, Psycopg2, SQLAlchemy. We coded mainly in Python as that was the language we were taught in the course. We used Terraform so we could deploy our infrastructure as code for replicability. PostgreSQL was the platform the database was using. Finally we used Psycopg2 and SQLAlchemy to connect to the database.

Challenges Faced

Completing the pipeline was a huge learning experience with many challenges to be faced- from orchestrating the pipeline to dealing with branching issues and debugging. If we were to do things differently, we would be more flexible with the structure of the project and the tools we used- potentially choosing services like AWS Glue for better integration. If it was possible we’d also reconsider the choice of database to ensure smoother compatibility. We learned a lot about how file formats and structures affect processing, as well as the impact of external dependencies. Having the knowledge we have now would definitely be helpful for speeding up the process and to design a more robust pipeline from the get go.