Skip to content

Student Projects – Banshee project phase

Made by Banshee

Debug hard. Deploy fast.

The following directory acts as an Extract-Transform-Load (ETL) pipeline, transporting and re-structuring raw, unformatted data to Dimension and Fact tables, specifically in accordance with several predetermined, outlined Star Schemas. This is achieved primarily through the use of Cloud Engineering, Data Engineering and Code as Infrastructure principals and thus fundamentally relies on the commissioning of Amazon-Web-Services (AWS) resources utilising Terraform along-side fully tested and reviewed Python scripts. In addition, throughout the project, Continuous Integration and Continuous Delivery (CI/CD) was practiced, alongside Test-Driven Development (TTD) to maximise both the effectiveness and validity of such code, as written and deployed. Principally, the structure of this infrastructure is represented within the above diagram. A scheduler was initialised to trigger an AWS State-Machine every 20 minutes, which in-turn activates all three lambda sequentially encapsulated within the aforementioned State-Machine.

The Team

Meral Hewitt

Meral Hewitt

MEng in Architecture and Environmental Design from

University of Nottingham. Recent graduate pursuing career in Data Analytics/Science.

Shea Macfarlane

Shea Macfarlane

Recent graduate from a MA in Linguistics with interests in

Human-Computer-Interaction, Computational Linguistics and LLMs.

Ahmad Fadhli

Ahmad Fadhli

MSc CEng MIMechE , currently employed by Rolls-Royce plc,

Engine Health Monitoring.

Mihai Misai

Mihai Misai

Certified Personal Trainer and Business Manager

transitioning into Software Development.

Anna Fedyna

Anna Fedyna

BSc Applied Mathematics

Carlo Danieli

Carlo Danieli

MSc Aerospace Engineering CEng MIMechE , currently employed

by Rolls-Royce plc, Engine Health Monitoring.

Tech Stack

Tech Stack for this group

We used: Python (TDD with Pytest), AWS (Lambda, RDS, CloudWatch, IAM, Step Functions, Eventbridge), Terraform, CI/CD (Github Actions, Github Secrets), Postgres, SQL Using these technologies allowed us to gain relevant project experience as they are commonly used in industry. They are also the tools that we were collectively most familiar with, enabling us to work efficiently from day 1, which was important given our limited time frame.

Challenges Faced

Throughout the development of our ETL pipeline, we encountered several technical challenges that required strategic problem-solving and strong teamwork to overcome. These challenges included the state machine configuration, handling failed executions without losing data and splitting tasks. Check out the finished project at https://github.com/mihaimisai/de-project