Made by Team Girley

From Raw to Refined

Our team developed a scalable data pipeline to automate the extraction, transformation, and loading (ETL) of data from an operational database into an AWS-hosted data lake and data warehouse. The project was designed to enable efficient data processing, improve data accessibility, and provide structured insights for analytics and reporting. Key features of the project: Data Ingestion: Extracted data from a PostgreSQL database using an AWS Lambda function, secured with GitHub secrets. Used pg8000 for query execution and parameterized queries for dynamic data extraction. Data Transformation & Storage: Stored raw data in Amazon S3, processed transformations an AWS Lambda function, and loaded structured data into Amazon S3 using a star-schema model. Automation & Deployment: Used Terraform to provision AWS infrastructure, including S3, IAM, Lambda, Step Functions. Integrated CI/CD with GitHub Actions for automated deployment. Monitoring & Error Handling: Configured CloudWatch Logs to track execution and failures. Implemented structured logging with timestamps and error messages. Set up SNS alerts for critical failures. Branching & Documentation: Followed GitHub Flow for development; created a Wiki for project documentation. Security & Compliance: GitHub Secrets: Secure storage of credentials; restricted access in GitHub Actions. AWS Security: IAM roles/policies via Terraform; CloudTrail logging; Security Groups for access control. Terraform Security: Remote state in versioned S3; restricted .tfvars in version control. Code Security: Branch protection rules; .gitignore for sensitive files. This project provided hands-on experience with AWS data engineering, infrastructure as code, automation, and cloud security. It also involved working in a team, applying Agile methodologies, and ensuring best practices for logging, monitoring, and optimization.

The Team

Sathiyavathi Anandkumar

With a strong academic background in Information…

Technology, extensive experience in academia and research, and a deep passion for data analysis, I am embarking on a career transition to pursue opportunities as a Data Analyst. Currently at final phase of completion of a Data Technician Bootcamp at JustIT Training, in order to consolidate my previous knowledge and gain additional essential technical skill for successful career transition as Data Analyst. Also, working on several data analysis projects, utilising Excel, Tableau, PowerBi, Python, Azure, SQL, and R. I aspire to secure a challenging role as a Data Analyst, where I can influence my critical and creative thinking skills and passion for data-driven decision making to contribute to business growth and success.

Tom Ashford

I am a business leader and entrepreneur with over a decade…

of experience in business development, operations, and customer experience across the UK and Southeast Asia. Recently returned to the UK after eight years in the Philippines, where I co-founded an award-winning restaurant and yoga studio, I am now transitioning into coding, data engineering, and software development. With a proven track record in leadership, operational efficiency, and client relationships, I am eager to apply my technical and problem-solving skills to create impactful solutions in the tech industry.

Prince Olubari

Electrical Design Engineer | Tech Enthusiast in Data…

Engineering and the Cloud | Maths Educator | AWS Certified

Muhammad Alom

Certified in AWS Cloud, Microsoft Azure, IBM Data Science,…

IBM AI Development, and IBM Data Engineering, I bring a strong technical foundation in cloud computing, data science, AI, and data engineering. My skills include: Cloud Infrastructure (AWS & Azure): Proficient in deploying and managing virtual machines, storage, and network configurations for scalable, secure environments. Data Science & Machine Learning: Experienced in Python-based data analysis, predictive modeling, and visualization to drive data-informed decisions. AI Development: Skilled in building and fine-tuning machine learning models, NLP, and computer vision applications using IBM Watson and TensorFlow. Data Engineering: Knowledgeable in ETL, data pipeline creation, and database management for efficient, data-driven systems. With a background in project management and teaching, I excel in collaborative, innovative environments and am eager to bring my expertise to impactful projects in tech.

Zidan Wang

Junior Data Engineer | Python | SQL | AWS cloud Freelance…

Simultaneous Interpreter

Pablo Caldas

Adaptable and resourceful, I am transitioning into data…

engineering with a strong passion for solving intricate problems and streamlining systems. My nearly five years of Salesforce QA experience have honed my analytical skills and attention to detail, which I now apply to mastering modern data engineering practices. Having recently completed the Northcoders Data Engineering Bootcamp, I have gained real world experience and proficient knowledge in Python, cloud technologies, and database management. With a hands-on approach and a collaborative mindset, the goal is to contribute effectively to data-driven solutions in the tech industry.

Tech Stack

AWS Lambda: Used for data extraction, transformation, and loading automation. PostgreSQL: The operational database from which data was extracted. AWS S3: For storing raw data and processed data in the data lake. AWS IAM: For secure access management and role-based access control. Terraform: For provisioning and managing AWS infrastructure, ensuring consistency and version control. AWS Step Functions: To automate workflows for data processing and movement. GitHub Actions: For CI/CD automation, enabling streamlined deployment and testing. AWS CloudWatch: For monitoring, logging, and tracking system health and performance. SNS: For alerting in case of critical errors. pg8000: For querying PostgreSQL with parameterized queries in Python. AWS Lambda was chosen for its serverless nature, scalability, and ease of integration with other AWS services like S3 and Step Functions. PostgreSQL was the data source due to its robustness, relational nature, and compatibility with the required data processing needs. AWS S3 provided a cost-effective and scalable storage solution for raw and processed data. AWS IAM was used to ensure security and enforce the principle of least privilege across the team. Terraform was selected for its Infrastructure-as-Code capabilities, enabling automated and repeatable provisioning of AWS resources. AWS Step Functions allowed us to automate workflows in a visual, easy-to-manage way, ensuring that data was processed and moved correctly. GitHub Actions helped automate deployment pipelines, making it easier to continuously integrate and deploy changes. AWS CloudWatch offered real-time logging and monitoring, ensuring visibility into the Lambda functions’ execution and allowing quick responses to failures. SNS provided a reliable mechanism for sending notifications on critical issues, keeping the team informed. pg8000 was chosen for interacting with PostgreSQL in a lightweight, Pythonic way, supporting dynamic queries.

Challenges Faced

Yes, during the course of this project, we encountered a couple of significant challenges: Database Credentials Exposure: We initially discovered that our database credentials were accidentally exposed on GitHub. To resolve this, we implemented GitHub Secrets for secure storage of sensitive data, ensuring that we never hardcode credentials in the repository. Additionally, we used a .gitignore file to prevent sensitive files from being committed. Lambda Layers Implementation: We faced some difficulty in setting up and using Lambda layers. There was confusion around the process and the right way to install dependencies within layers. We spent a considerable amount of time debugging and researching solutions. Ultimately, we successfully implemented Lambda layers, but this was a learning curve that slowed our progress. Despite these challenges, we were able to overcome them through teamwork, persistence, and further research. The project provided valuable hands-on experience with AWS services, Terraform, and CI/CD practices. Working in a team, we also applied Agile methodologies, and throughout the project, we focused on best practices for logging, monitoring, and automation. It was a challenging yet rewarding project, and we gained a deeper understanding of cloud infrastructure and data engineering.

Student Projects – Team Girley project phase