Data Science Projects

HydroPoint In-House Data Science Projects

As Chief Data Scientist at HydroPoint, I led the full data science lifecycle, from defining business objectives to deploying AI-driven models. My role combined data science, business strategy, and product innovation, ensuring data-driven insights translated into measurable business impact.

Key Responsibilities:

Built and led the Data Science team, hiring top talent and fostering continuous learning.
Developed a unified data platform, integrating business, device, and customer data for decision-making.
Drove AI/ML initiatives, defining predictive modeling needs across supply chain, HR, and analytics.
Aligned business and data science, ensuring stakeholders effectively applied predictive insights.
Supported MLOps and model optimization, hiring data science postgraduates for performance tuning.
Partnered with engineering and product management to enhance data infrastructure and AI-driven features.

By merging AI analytics with business strategy, my team played a pivotal role in HydroPoint’s growth and innovation.

MSc Data Science Projects

The following projects showcase some of my data science projects, coded in Python as part of my Master of Science (MSc) in AI/ML from Liverpool John Moores University (LJMU). Beyond academic projects, I also explored cutting-edge technologies like Streamlit, PyCaret, Penguin, and Shapash to enhance data visualization and machine learning workflows.

These projects demonstrate:
✔ Proficiency in Python programming for data science applications.
✔ Deep understanding of machine learning algorithms and their real-world applications.

Moving forward, I plan to apply these techniques to industry-specific challenges in smart grids, meter data management (MDM), drinking water systems, wastewater treatment plants, and clean energy solutions. My goal is to build a community of professionals who can benefit from my code, using it as a foundation for their own projects and innovations.

Rossmann Retail Sales – Time Series Analysis and Prediction

Business Goal:
This project involved analyzing sales data from a European retail drugstore chain with nine locations. The objective was to perform exploratory data analysis (EDA), identify key variables impacting sales, and assess the relationship between sales and customer trends through co-integration analysis.

To develop accurate sales predictions, we implemented multiple time series forecasting models, including:
✔ VAR (Vector Autoregression) – Capturing relationships between multiple time-dependent variables.
✔ VARMAX (Vector Autoregression Moving-Average with Exogenous Inputs) – Incorporating external factors to improve forecasting.
✔ SARIMAX (Seasonal Autoregressive Integrated Moving Average with Exogenous Regressors) – Handling seasonality and external influences for more precise predictions.

The performance of these models was compared to determine the most effective forecasting approach for retail sales optimization.

🔗 View the full project and code here: GitHub Repository

SuperCabs Profit Optimization using Deep Q Network

Reinforcement Learning

Business Goal:
Cab drivers, like most professionals, are motivated by steady income growth. This project aimed to develop a reinforcement learning (RL)-based algorithm that enhances cab drivers’ decision-making processes, helping them maximize profits through optimized ride selection and routing.

Using a Deep Q Network (DQN), the model learns from past trip data to:
✔ Identify the most profitable locations and time slots for pickups.
✔ Optimize route selection to reduce idle time and fuel costs.
✔ Improve overall efficiency and earnings for drivers by making data-driven decisions in real-time.

The project applies reinforcement learning techniques to create a more sustainable and profitable approach for ride-hailing businesses.

🔗 View the full project and code here: GitHub Repository

Building an RL Agent to Play Tic-Tac-Toe

Business Goal:
This project focused on developing a reinforcement learning (RL) agent using Q-learning to play Numerical Tic-Tac-Toe with odd numbers. The goal was to train the agent to strategically place odd numbers while competing against an environment that plays randomly using even numbers.

Key aspects of the project include:
✔ Implementing Q-learning, a model-free RL technique, to enable the agent to learn optimal moves over time.
✔ Designing the game environment where the agent competes against an opponent placing even numbers randomly.
✔ Training the agent to maximize its chances of winning by recognizing patterns and adapting its strategy.

This project demonstrates how reinforcement learning can be applied to simple game environments, building a foundation for more complex AI-driven decision-making systems.

🔗 View the full project and code here: GitHub Repository

Gesture Recognition using CNN / RNN

Business Goal:
This project aimed to develop a gesture-based control system for smart TVs, allowing users to operate the TV without a remote. Using computer vision and deep learning, the system continuously monitored gestures via a webcam, recognizing five distinct hand movements to execute specific commands.

The implemented gestures and their corresponding actions:
✔ Thumbs up – Increase volume
✔ Thumbs down – Decrease volume
✔ Left swipe – Jump backward 10 seconds
✔ Right swipe – Jump forward 10 seconds
✔ Stop – Pause the movie

By leveraging Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), the model processed real-time video feeds to classify gestures accurately, enhancing user experience and accessibility for smart TV interactions.

🔗 View the full project and code here: GitHub Repository

NER and CRF Techniques for Medical Data Parsing and Prediction

Business Goal:
This project focused on loading healthcare data, classifying diseases, and interpreting medical jargon using Named Entity Recognition (NER) and Conditional Random Fields (CRF). The objective was to accurately categorize medical terms into relevant topics and predict appropriate therapies for specific diseases based on textual analysis.

Key aspects of the project:
✔ Processed and structured complex healthcare data for disease classification
✔ Applied NER to extract key medical entities from unstructured text
✔ Used CRF models to classify terms into relevant medical categories
✔ Developed predictive models to recommend therapies for diagnosed conditions

The implementation of NER and CRF enabled more efficient medical data processing, improving the accuracy of disease classification and treatment recommendations.

🔗 View the full project and code here: GitHub Repository

Wine Classifier using PyCaret and Streamlit

Business Goal:
This project focused on building a wine quality classifier using PyCaret and Streamlit to develop and deploy a data science web application. The objective was to create a streamlined, interactive tool for evaluating wine quality based on key attributes.

Key aspects of the project:
✔ Utilized PyCaret to automate machine learning model selection and tuning
✔ Built a web-based classifier using Streamlit for real-time user interaction
✔ Processed wine dataset features to predict quality ratings efficiently
✔ Developed a fully functional and deployable app for classification

By integrating PyCaret’s low-code machine learning capabilities with Streamlit’s interactive UI, this project demonstrated how to build and deploy machine learning models with minimal effort.

🔗 View the full project and code here: GitHub Repository

Australian House Price Prediction using Linear, Ridge, and Lasso Regression

Business Goal:
This project involved building a predictive model for Australian house prices using linear regression, along with Ridge and Lasso regression techniques. The objective was to determine how different predictor variables impact sale prices and to identify high-yield areas for better investment decisions.

Key aspects of the project:
✔ Developed a linear regression model to establish baseline predictions
✔ Applied Ridge and Lasso regression to refine coefficients and reduce overfitting
✔ Analyzed how sale prices vary with different features to provide actionable insights
✔ Helped management understand pricing dynamics and focus on the most profitable areas

By leveraging multiple regression techniques, this project provided a data-driven approach to real estate valuation and investment strategy.

🔗 View the full project and code here: GitHub Repository

Telecom Churn Prediction using Lasso and Ridge Regression

Business Goal:
This project was a group case study focused on analyzing telecom company data to identify key factors contributing to customer churn. The goal was to develop predictive models using Lasso and Ridge regression to understand churn drivers and provide actionable insights for customer retention.

Key aspects of the project:
✔ Conducted data understanding and exploratory analysis to identify churn patterns
✔ Used univariate, bivariate, and multivariate analysis for deeper insights
✔ Performed data cleaning and preparation to ensure model accuracy
✔ Built and evaluated predictive models using Lasso and Ridge regression
✔ Presented conclusions and recommendations to improve retention strategies

By leveraging advanced regression techniques, this project helped highlight the key predictors of churn, enabling better decision-making for telecom service providers.

🔗 View the full project and code here: GitHub Repository