Made by Team Spitfire

A Data Engineering project

The aim of the project was to apply key skills picked up during the Northcoders bootcamp, to real-world, business requirements. We were tasked with helping a fictional company to create a platform for managing their enterprise data. We implemented a pipeline to move and transform data from a live operational database to a more streamlined data warehouse, better suited for business analytics. We used a range of tools to achieve this, including: * Amazon Web Services as the cloud solution – including lambda, cloudwatch and eventbridge to automate and monitor the process * Terraform and gitactions to deliver infrastructure-as-code and implement CI/CD principals. * Various python applications, which orchestrated the Extract, transform and load processes. * Tableau to build a business intelligence dashboard, giving us insight into our transformed data.

The Team

Bryony Jones

Bryony’s background is in Biology, education, and science…

communication. They enjoy explaining complicated concepts to diverse audiences and work best when they can work around both numbers and people!

Neema

Curious, creative, and driven . I turn complex data into…

clear stories and elegant solutions.

Richard Armstrong-Wood

Interested in software, research, and design, while being…

fascinated by how data can positively influence technological development. Using my background in Mechanical and Rehabilitation Engineering, I love to make, problem solve, and prototype.

Bruno Sterza Baggio

With a background in psychology and education, Bruno has a…

deep curiosity and passion for understanding how things work and collaborative learning.

Stefanie Watson

I’m a data-driven GIS specialist with a scientific…

background, passionate about all things data. I focus on spatial data quality, automation, and innovative problem-solving.

Tech Stack

We used Python, PostgreSQL, Pandas, AWS (S3, Lambda, EventBridge, Step Functions, lambda layers, IAMs, Cloudwatch), Terraform, GitHub, GitHub Actions, Pytest, Coverage, Flake8, Black, MyPy, VSCode. We wanted to use popular, widespread tools which we would encounter in day-to-day life as data engineers. We looked for tools which integrated seamlessly with each other and made piecing our project together as smooth as possible. GitHub was an excellent way of keeping the project collaborative and version-controlled. The use of Terraform allowed us to integrate Amazon web services (AWS) infrastructure from the outset. This gave us the flexibility to adapt our design as we encountered technical challenges. Lambda functions and s3 buckets allowed the entire project to be cloud based.

Challenges Faced

Learning the ropes working on an agile project using GitHub source control for the first time was quite a challenge! However, by the last week, we were a well-oiled coding machine.