Made by Team Lawnmower Museum

Interested in neither lawnmowers nor museums

Our project builds an automated data pipeline using AWS that takes data from a database and applies an Extraction, Transform and Load process to it, or ETL. The code for the AWS Lambdas are stored in a code bucket rather than locally. We also configured credentials using AWS Secrets Manager. In the extraction step, we have an AWS Lambda that extracts data from a Postgres database every minute, and stores it in our S3 ingestion bucket. Completion of this step triggers our AWS Step Functions to run the Transformation step, where an AWS Lambda takes data from the S3 bucket and transforms it into a format suitable for our data warehouse. This is stored into a second S3 bucket. Finally, in the Load step, we take data from our second S3 bucket and load it into our data warehouse, ready to be used in data visualisation. This entire process is set up to run in a CI/CD pipeline, and AWS CloudWatch is used throughout to monitor our pipeline and send email alerts if the pipeline breaks.

The Team

Callum Duguid

I’m transitioning into data engineering to apply a…

structured, analytical approach to solving real-world problems with data. I bring experience from an academic career at the intersection of science and the humanities, managing complex projects and communicating technical concepts across disciplines. I’m drawn to data engineering for its blend of logic, structure, and impact — especially the chance to design systems that support sound decision making.

Marc Shanmugaratnam

I’m a Physics graduate who has spent some time working in…

retail. I’ve taken the opportunity to join this course to transition into data engineering/tech. I have learnt many useful things during this course, including industry-standard practices and how programmers work as part of a team. During the project phase, I learnt how to tackle some of the problems you might face as part of a data engineering team. Solving those involved both interpersonal and technical skills. I plan on doing further learning after this as I really enjoyed the technical aspects of data engineering, particularly Terraform and CI/CD.

Marta Kaczan

I have always enjoyed problem solving, analytical puzzles…

and maths, and I am looking forward to combining my skills in these areas with the programming knowledge gained during the Northcoders Data Engineering Bootcamp. I studied Mathematics for my undergraduate Masters degree and then spent a decade in the asset management industry as an investment risk analyst. I am excited to take on new challenges in the technology industry.

Taimoor Khawaja

I earned my Master’s in Computer Networks in 2012, but my…

career path initially took a different direction. I entered the restaurant industry, where I eventually became a successful franchise owner. Managing a business gave me hands-on experience in leadership, decision-making, and adaptability. After selling the business, I relocated to the UK and decided to return to my roots in IT. I’m currently enrolled in the Northcoders bootcamp, where I’m sharpening my skills in data engineering through hands-on learning. I’m excited to continue growing in this field and build a long-term career in tech.

Cristine-Roxana Niculae

Dedicated and flexible professional with a solid background…

in computer science and mathematics, as well as a wealth of teaching and logistics experience. Pursuing a hands-on bootcamp called “Cloud Engineering with Python” that focuses on Python and contemporary data tools in order to shift into data engineering. I have a strong work ethic, analytical thinking, and problem-solving abilities. eager to use technical and managerial skills in a position that relies heavily on data. My contact details: [email protected]

F Nahisah M Nasleem

I’m a Computer Science student and aspiring engineer with a…

passion for both data and software development. Currently completing the Northcoders Data Engineering Bootcamp, I’ve gained hands-on experience with Python, AWS, Terraform, and CI/CD pipelines, while continuing to build on my solid foundation in full-stack software development using technologies like JavaScript, Django, and React. I enjoy designing scalable systems and I’m excited to bring both technical and creative problem-solving skills to a role in tech.

Tech Stack

We used Python, SQL, AWS services, Terraform, GitHub Actions, Tableau. We chose these technologies because they offer a modern data engineering and analytic workflow and are widely used in the industry. Together, they cover every stage of the pipeline from initial data extraction and transformation to visualising business intelligence. For example, we used AWS StepFunctions to orchestrate our Lambdas and set up monitoring using CloudWatch and SNS. We also used S3 buckets to store files and code throughout the pipeline process. All of this was provisioned using Infrastructure-as-code (Terraform). We further automated our pipeline using CI/CD with GitHub Actions. This ensured that all committed code was tested, security-checked, linted and formatted. We chose Tableau as it is an industry-standard business intelligence tool. It also offers a very broad range of functionalities for visualising data & business insights.

Challenges Faced

Throughout the project, we faced several technical and architectural challenges that required careful decisions and iterative problem-solving: Integration Complexity Integrating AWS services like S3, Lambda, RDS, and Secrets Manager alongside a CI/CD pipeline was one of the biggest challenges. Setting up IAM roles with the right permissions and making sure services could interact securely took significant time and debugging. Terraform & CI/CD Setup A key challenge was ensuring the S3 bucket used for storing Lambda code was created before defining the Lambda function itself. We solved this using conditional logic and managing dependencies between resources. Data Quality & Consistency The ToteSys source data had several formatting issues, especially at transformation stage with string values from pandas that required PostgreSQL’s $$…$$ syntax for proper insertion. We also needed to ensure that only updated rows were added to the data warehouse. This was handled using temporary staging tables. Possible improvements Had we had more time we would’ve liked to configure our Lambdas to update their source code as it is changed. The current setup has the Lambdas destroy and recreate themselves with the new code. We would’ve also liked to call an exchange rate API so our visualisation insight would be more accurate.