Made by team-01-data-squid
For ink-redible insights, dive into data with Data Squid
As part of this three-week project, we developed a robust ETL (Extract, Transform, Load) data pipeline. Completed by Carlos Byrne, Liam Biggar, Nicolas Tolksdorf, Shay Doherty, Girish Joshi, Ethan Labouchardiere, it extracts data from an operational database (totesys) and loads it into an AWS-based data lake and data warehouse. Our architecture leverages AWS services such as S3 for data storage, Lambda functions for data processing, EventBridge for orchestration, and CloudWatch for monitoring. We implemented a CI/CD pipeline using GitHub Actions and utilized Terraform for infrastructure as code. Our project focuses on creating a scalable and automated data platform to support analytical reporting and business intelligence, with the initial implementation of a Sales star schema as the Minimum Viable Product (MVP).
The Team
Ethan Labouchardiere
Bachelor of Mechanical Engineering graduate and British…
Army Royal Engineer reservist, with an increasing data engineering and cloud computing skillset. Since graduation, I have focused on personal development activities within the Army Reserve, and am continuously expanding my knowledge in key areas such as Python, SQL, cloud platforms (particularly AWS), data pipelines, and Infrastructure as Code (IaC) using Terraform. I’m also familiar with MATLAB, Git version control, APIs, and server management. Additionally, I’ve gained valuable life experience in a public-facing security role and travelled across Southeast Asia. I am now looking to enter the workplace in an innovative, forward-thinking company where I can apply my technical skills and diverse experiences to contribute to the growth of the wider UK economy.
Girish Joshi
With 18 years of experience in project management, product…
development, and technical leadership in the food manufacturing industry, I have a proven track record of driving innovation and delivering results. Over the past 18 months, I’ve shifted my focus towards Data Science and Machine Learning, completing a rigorous 9-week boot camp where I improved my Python, SQL, and data analysis skills. In my recent Data Engineering internship at Genassis, I worked on developing and maintaining data pipelines, enhancing AI model performance through data integration and Retrieval-Augmented Generation (RAG) capabilities. This experience solidified my ability to manage complex data projects and troubleshoot effectively to meet deadlines. Alongside my technical skills, I’ve also built a successful YouTube channel, Masala Chai, growing it to over 100K subscribers through data-driven content strategies. I’m eager to continue learning and am particularly interested in roles where I can bridge my background in project management with my new data science skills to create meaningful solutions. Let’s connect if you’re looking for an adaptable, determined professional who is ready to contribute to your team.
Nicolas Tolksdorf
Nicolas Tolksdorf is a skilled data engineer and software…
developer with hands-on experience in building, testing, and maintaining data warehouses. Nicolas has worked on a data platform tailored to an education company in London, leveraging his software development skills, collaborative mindset, and data engineering expertise to drive impactful outcomes.
Tech Stack

Core Scripting & Data Handling: – Python – Core scripting language – PG8000 – Secure PostgreSQL interaction – SQL – Querying for data extraction Serverless & Storage: – AWS Lambda – Executes Python code – AWS S3 – Stores raw & processed data – Parquet – Optimised columnar storage Data Processing: – JSON – Used for structured data exchange – Pandas – Data manipulation & transformation – AWS RDS – Managed relational database Orchestration & Automation: – AWS Event Bridge – Schedules & triggers Monitoring & Alerts: – AWS Cloud Watch – Logging & Monitoring – AWS SNS – Email notifications for failures Security & Integration: – AWS Secrets Manager – Manages credentials securely – Boto3 – Connects Python to AWS services Deployment & Infrastructure: – Terraform – Defines & manages AWS infrastructure – GitHub Actions – Automates CI/CD pipeline. Python: Powered the ETL (Extract, Transform, Load) functions for data processing. PG8000 and SQL: Facilitated database interactions and queries. AWS Lambda: Automated the execution of various ETL tasks. S3: Acted as the data lake for storing raw and processed data. JSON and Pandas: Used for transforming and manipulating data structures. Parquet: Stored processed data efficiently before loading it into the data warehouse. AWS EventBridge: Managed event-driven task execution. CloudWatch: Enabled logging and monitoring of the pipeline. SNS (Simple Notification Service): Sent error alerts for better error handling. Boto3: Simplified AWS service interactions. Secrets Manager: Provided secure storage for sensitive credentials. Terraform: Facilitated infrastructure management as code. GitHub Actions: Automated CI/CD (Continuous Integration and Deployment) for the pipeline.
Challenges Faced
Transforming datetime columns into correct format Package size limit for the Lambda layers IAM role testing and permissions CI/CD deployment Getting testing coverage above 90%