COSMIC COMMAND INTERFACE Ā· MODULE 03

Machine Learning Lab

Real-world ML systems across supervised and unsupervised learning, pulled into a single command deck.

← Go to Home

Supervised Learning

Models trained on labeled data to make predictions and classifications in domains like agriculture, healthcare, and finance.

Supervised Classification Agriculture Random Forest

SmartHarvest – Crop Recommendation Engine

End‑to‑end ML system recommending the most suitable crop using soil nutrients (NPK), climate, and soil properties.

Stack: Python Ā· Scikit‑learn Ā· Pandas Ā· Streamlit
Highlights: ~95% accuracy with feature‑engineered agronomic variables.

Supervised Classification Healthcare KNN

Mr. Cardio Disease Astrologer

KNN‑based heart disease prediction using indicators like cholesterol, blood pressure and heart rate.

Stack: Python Ā· Scikit‑learn Ā· Streamlit
Highlights: Interactive UI for risk estimation with explainable inputs.

Supervised Classification Finance NaĆÆve Bayes

Lending Logic – Loan Approval System

Gaussian NaĆÆve Bayes model predicting loan approval probability using engineered financial features.

Stack: Python Ā· Scikit‑learn Ā· Streamlit
Highlights: ~92% accuracy with focus on reducing false positives.

Supervised Classification Healthcare SVM

CardioPredict – SVM Heart Risk Model

Support Vector Machine model predicting heart disease risk from clinical features with a Streamlit front-end.

Stack: Python Ā· Scikit‑learn Ā· Streamlit
Highlights: Margin‑based classifier tuned for medical risk prediction.

Supervised Classification Marketing KNN

Mobile Market Segmenter

KNN model classifying mobile users into market segments using usage behaviour and demographic patterns.

Stack: Python Ā· Scikit‑learn Ā· Streamlit
Highlights: Simple, interpretable clustering-style segmentation via supervised labels.

Supervised Classification Behavior Logistic Regression

Personality Type Predictor

Logistic Regression model mapping behavioural traits to personality categories with an interactive UI.

Stack: Python Ā· Scikit‑learn Ā· Pandas Ā· Streamlit
Highlights: Clean feature pipeline for survey‑style inputs.

Supervised Regression Mobility Linear Regression

Ride Price Predictor

Linear Regression estimator for ride fares using distance, duration, time of day and ride type.

Stack: Python Ā· Scikit‑learn Ā· Pandas Ā· Streamlit
Highlights: Baseline pricing model for ride‑hailing style apps.

Supervised Regression HR / Salary KNN Regression

Experience to Earnings

KNN regression model predicting salary from years of experience as an intuitive career analytics tool.

Stack: Python Ā· Scikit‑learn Ā· Streamlit
Highlights: Non‑linear salary curve captured via neighbor‑based regression.

Supervised Regression Education KNN

Student Performance Analyzer

End-to-end ML regression model predicting student exam scores using KNN with hyperparameter tuning and feature engineering.

Stack: Python Ā· Scikit-learn Ā· Streamlit Ā· Pandas
Highlights: MAE: 6.89 | RMSE%: ~17.44% | Feature importance analysis with model interpretability.

Supervised Classification Behavioral Logistic Regression

PERSONA PULSE – MBTI Personality Classification

End-to-end ML pipeline for MBTI personality type classification using questionnaire data with leakage-safe scikit-learn Pipeline.

Stack: Python Ā· Scikit-learn Ā· Pandas Ā· Streamlit
Highlights: Strong multiclass Logistic Regression (~0.92 F1) | Full model interpretability | Real-time predictions MVP app

Supervised Regression & Classification E-commerce Ridge & Logistic

🚚 Delivery Oracle – E-commerce Delivery Intelligence

End-to-end delivery intelligence system using Olist dataset with Ridge regression for ETA prediction and Logistic Regression for late-delivery risk classification.

Stack: Python Ā· Scikit-learn Ā· Pandas Ā· Streamlit Ā· Joblib
Highlights: Multi-table feature engineering (distance, time, parcel metrics) | Streamlit checkout-style UI | Real-time delivery risk alerts and ETA predictions

Supervised Regression E-commerce Random Forest

Used Car Price Predictor – CarDekho

Regression-based ML app using Random Forest Regressor to predict fair selling prices of used cars from 45K+ CarDekho listings with strong performance (R² ā‰ˆ 0.92).

Stack: Python Ā· Scikit-learn Ā· Streamlit Ā· Pandas Ā· Plotly

Features: Real-time price prediction from car specs | Feature importance and model insights dashboard | Handles categorical & numerical features with full preprocessing pipeline | MAE, RMSE, R² evaluation

Supervised Classification Healthcare Decision Tree

Cardiovascular Health Risk Assessment

Interpretable ML system using a Decision Tree Classifier on 70K+ patient records to predict cardiovascular disease risk from vitals, lab values, and lifestyle factors.

Stack: Python Ā· Scikit-learn Ā· Streamlit Ā· Pandas Ā· Plotly Ā· Joblib

Features: BMI and age-group based feature engineering | Real-time risk prediction with probability scores | Feature importance and clinical visualizations | Streamlit dashboard optimized for healthcare analytics

Supervised Classification Healthcare Decision Tree

Medicine Recommendation System

AI-powered diagnostic tool using Decision Tree Classifier to predict diseases from symptoms and provide personalized recommendations.

Stack: Python Ā· Scikit-learn Ā· Streamlit Ā· Pandas Ā· Joblib

Features: Symptom-based disease prediction | Personalized medication, diet, precautions, and workout recommendations | Interactive Streamlit interface

Supervised Classification Sports Analytics Random Forest

ATP Tennis Match Outcome Classifier

ML-powered sports analytics tool using Random Forest to predict ATP tennis match winners from pre-match rankings across 59K+ matches with 99.5% accuracy.

Stack: Python Ā· Scikit-learn Ā· Streamlit Ā· Pandas Ā· Plotly

Features: Ranking-based, leakage-free pre-match predictions | Win probability visualization | Analysis of 20 years of ATP history (2000–2019) | Interactive Streamlit interface

Supervised Regression Finance Decision Tree

OHLCV Next-Day Close Predictor

Time-series ML mini product that predicts next-day stock closing prices using OHLCV features, Decision Tree Regression, and a naive ā€œtomorrow ā‰ˆ todayā€ baseline for comparison.

Stack: Python Ā· Scikit-learn Ā· Streamlit Ā· Pandas Ā· Plotly Ā· yfinance Ā· Joblib

Features: Real-time OHLCV fetch from Yahoo Finance | Lag, moving average, and volatility feature engineering | GridSearchCV with TimeSeriesSplit | MAE/RMSE vs naive baseline | Wall Street-themed Streamlit UI

Supervised Regression Energy Random Forest

PJM Energy Demand Forecaster

End-to-end ML system predicting hourly electricity demand using 10+ years of PJM load data with Random Forest Regression and advanced time-series feature engineering.

Stack: Python Ā· Scikit-learn Ā· Streamlit Ā· Pandas Ā· Plotly Ā· NumPy Ā· Joblib

Highlights: 10+ years of multi-region PJM data (AEP, COMED, DAYTON, DEOK, DOM) | Time-based features (hour, day, month, season, lags) | MAE ā‰ˆ 500 MW, RMSE ā‰ˆ 700 MW, R² ā‰ˆ 0.95 | Interactive Streamlit dashboard for real-time forecasting and historical analysis

Supervised Classification Healthcare SVM

VociPark – Parkinson’s Disease Detection

End-to-end Parkinson’s detection from voice data using an SVM classifier optimized via GridSearchCV and StandardScaler pipelines.

Stack: Python Ā· Scikit-learn Ā· Streamlit Ā· Pandas Ā· Joblib

Highlights: ~82% Test Accuracy | Balanced Accuracy ~0.80 | Stratified Train-Test Split | Tuned C, Gamma & Kernel

Unsupervised Learning

Clustering and dimensionality reduction projects uncovering natural structure in data without labels.

Unsupervised Clustering Agriculture K-Means

Vineyard Voyager – Wine Classification

K-Means clustering identifying wine quality tiers from chemical properties without labels.

Stack: Python Ā· Scikit‑learn Ā· Pandas
Highlights: 3 distinct clusters with silhouette optimization.

Unsupervised Clustering Retail Hierarchical Clustering

Retail Radar – Customer Segmentation

Hierarchical clustering grouping customers into natural purchase behaviour segments.

Stack: Python Ā· Scikit‑learn Ā· Dendrogram
Highlights: Dendrogram-based optimal cluster detection.

Unsupervised Dimensionality Reduction Visualization PCA

PCA Sommelier – Wine Intelligence Studio

Portfolio-ready PCA wine analysis lab that reduces high-dimensional wine chemistry into 2–3 principal components for interactive exploration and insight.

Stack: Python Ā· Scikit‑learn Ā· Matplotlib Ā· Streamlit Ā· Pandas
Highlights:Interactive PCA pipeline with explained variance, component loadings, rich visualizations, and CSV export for wine datasets.

Unsupervised Geospatial ML DBSCAN Production

šŸŒ Geo-Pulse – Smart City Traffic Intelligence

Production-ready geospatial ML system identifying traffic accident hotspots using DBSCAN clustering, deployed on Streamlit with 3D Pydeck visualizations for city planning and insurance risk assessment.

Stack: Python Ā· Scikit-learn Ā· Pydeck Ā· Streamlit Ā· Pandas Ā· Joblib
Impact: Analyzed 3M+ US accident records | 87 hotspot clusters in LA | 15% noise detection | 3km optimal radius
Features: Interactive 3D maps Ā· State/city filtering Ā· Haversine geodesic distance Ā· Pre-trained model

Unsupervised Clustering Developer Analytics MiniBatch K-Means

šŸ‘„ Developer Persona Segmentation

ML-powered developer persona segmentation using MiniBatch K-Means clustering on Stack Overflow 2025 survey (~42K developers) to identify 3 distinct personas for hiring, marketing, and product strategy.

Stack: Python Ā· Scikit-learn Ā· Streamlit Ā· Pandas

Personas: Modern Web Builders (~45%) Ā· Generalist Developers (~35%) Ā· Veteran Builders (~20%)

Features: Feature engineering for multi-select tech stacks | Scikit-learn pipeline with imputation, encoding & scaling | Interactive Streamlit dashboard | Persona report CSV export

Unsupervised Dimensionality Reduction ICA t-SNE

Feature Extraction Engine – ICA & t-SNE

Independent Component Analysis and t-SNE for non-linear feature extraction and visualization.

Stack: Python Ā· Scikit‑learn Ā· Plotly
Highlights: Interactive 3D t-SNE visualizations with perplexity tuning.