Data Science Projects

HydroPoint In-House Projects:

As Chief Data Scientist at HydroPoint, I oversee the entire data science lifecycle, from defining the business objectives and identifying data sources for property, water agency, climate, etc. to ultimately designing and implementing data analysis methods and models. I am responsible for mentoring a team of data scientists, getting them new emerging AI tools, training, upskilling, and guiding them to design and implement product experiments, and customer-focus interviews leading to insights about what features and products customers are willing to buy and pay for. My team does complex analysis to drive company and product strategy and deploy online / offline models to a production environment. Additionally, I act as a liaison between the data science team and the business units, aligning the data strategy with business goals and vision.

My team is also responsible for building a single source of truth platform to store all the business/device/customer data and then use that data to drive decisions within the company – for increased revenue from software and hardware products that the insights reveal. We have a smaller Hydro Analytics team, in which a group of data analysts work on specific customer sales issues and generate reports (Pre-Analysis for their entire property portfolio and Hydropint Performace Management – a managed service providing ESG and water savings reports), interfacing directly with product and engineering stakeholders. My group is made up of people from a diverse set of backgrounds and perspectives, trained in fields as wide-ranging as data science, Python / Django /Node.js programming, statistics, and operations research. We are mathematical decision scientists for the product development organization and play an active and collaborative role in building and improving the product line and service offerings. Specific responsibilities are:

  1. Hire and Build the Data Science team.
  2. Meet with Business and document all the data needs to build the single platform database containing all business/device/customer data.
  3. Meet with customer and product management to define AI/ML needs: in the Supply Chain, HR et
  4. Provide modeling and prototyping issues so that the business understands how predictions/prescriptive suggestions work.
  5. Provide and support the MLOps team. Hire data science post-graduates to continuously tweak the performance of predictions.
  6. Work with ENG to support and build the internal data platform and maintain all the data needed for insights.
  7. Work with Product Management to define new products to ensure increased revenue and customer growth.

MSc. Data Science Projects:

This is a collection of some of my Data Science projects, which I coded in Python for my M.Sc in AI/ML from LJMU. In addition, I have done some projects that fascinated me and learned new technologies like Streamlit, PyCaret, Penguin, and Shapash for Data Visualization. This is an attempt to demonstrate the following:

(a) My proficiency in writing python code
(b) Understanding the various algorithms in DS and the ability to apply those to solve real-life problems

As a next step, I will be looking to apply these techniques to solve problems in my area of domain expertise – i.e. smart grid, MDM, drinking water systems, wastewater treatment plants, and other applications in the clean energy space. I would like to build an audience who can benefit from my code and use it as a starter code for their own projects.” 

Rossmann Retail Sales – Time Series Analysis and Prediction

Retail Drug Sales Analysis and Prediction

Business Goal:

We had to analyze the Sales of a retail drugstore chain’s nine(9) stores in Europe. Do detailed EDA, understand the impact of variables on Sales, and check for co-integration between Sales and Customers. Use VAR, VARMAX, and SARIMAX to predict and compare accuracy.

All code can be viewed at: here

SuperCabs Profit Optimization using Deep Q Network (Reinforcement Learning)

Business Goal:

Cab drivers, like most people, are incentivized by healthy growth in income. The goal of this project is to build an RL-based algorithm that can help cab drivers maximize their profits by improving their decision-making process in the field.

All code can be viewed at: here

Build RL Agent to Play Tic Tac Toe

Business Goal:

Build an RL agent (using Q-learning) that learns to play Numerical Tic-Tac-Toe with odd numbers. The environment is playing randomly with the agent, i.e. its strategy is to put an even number randomly in an empty cell.

All code can be viewed at: here

Gesture Recognition using CNN / RNN

Business Goal:

We are tasked to develop a cool feature in the smart-TV that can recognize five different gestures performed by the user which will help users control the TV without using a remote.

The gestures are continuously monitored by the webcam mounted on the TV. Each gesture corresponds to a specific command:

  • Thumbs up: Increase the volume
  • Thumbs down: Decrease the volume
  • Left swipe: ‘Jump’ backwards 10 seconds
  • Right swipe: ‘Jump’ forward 10 seconds
  • Stop: Pause the movie

All code can be viewed at: here

NER and CRF techniques for Medical Data Parsing and Prediction

Business Goal:

We are tasked with loading healthcare data and disease classification, understanding the medical jargon, classifying them into medical topics using NER and CRF and then predicting the therapy for a disease

All code can be viewed at: here

Wine Classifier using PyCaret and Streamlit

Business Goal:

I will be using two new libraries – PyCaret and Streamlit – to create this data science web app. We’re going to build a wine quality classifier. And, we’re going to use Streamlit to create and deploy this wine classifier.

All code can be viewed at: here

Australian House Price Prediction using Linear, Ridge and Lasso Regression

Business Goal:


We were required to build a Linear Regression model and use Ridge and Lasso Regression. To come up with the final coefficients so that the price of the houses can be predicted. Management wants to know how exactly the Sale Price varies with the predictor variables. They want to understand the pricing dynamics so that. they can focus on the high yield areas.

All code can be viewed: here

Telecom Churn Prediction using Lasso and Ridge Regression

Business Goal:

This project was a group case study to analyze data from a Telecom company and understand what contributes to churn.
Below are the high-level steps that will be followed:

  1. Data Understanding
  2. Data visualization and analysis(univariate/bivariate/multivariate)
  3. Data cleaning & Preparation
  4. Model Building & Evaluation
  5. Presenting Conclusion

All code can be viewed: here