HDB Resale Price Estimator
Telegram bot for HDB resale price prediction using ensemble ML & ARIMA
System Architecture Visual
Project Overview
A Telegram bot that predicts Singapore HDB resale flat prices using an ensemble of Gradient Boosted Tree models (CatBoost, LightGBM, XGBoost) combined with ARIMA for time-series trend forecasting. Users interact through a natural-language chat interface powered by an LLM, with predictions served via a FastAPI backend.
Methodology & Architecture
1. Data Collection & Feature Engineering
Ingesting historical HDB resale transaction data from data.gov.sg, enriched with proximity features such as distance to MRT stations, schools, hawker centres, and shopping malls.
2. Ensemble ML Modelling
Training CatBoost, LightGBM, and XGBoost regressors individually, then combining predictions via a stacking/blending ensemble to minimize RMSE and capture non-linear property price signals.
3. Time-Series Adjustment (ARIMA)
Applying an ARIMA model on residual price trends to account for macroeconomic seasonality and month-on-month price movements, blended with the GBT ensemble output.
4. FastAPI Backend
Serving the combined prediction pipeline through a RESTful FastAPI service, handling input validation, feature transformation, and model inference at low latency.
5. LLM-Powered Chat Module
Integrating an LLM to parse natural-language queries from Telegram users, extract structured flat parameters (town, flat type, storey, floor area), and return human-readable price estimates with explanations.
Technologies Used
Key Learnings
- Combining gradient boosted trees with ARIMA for hybrid price forecasting.
- Engineering geo-spatial proximity features from multiple public Singapore datasets.
- Building a conversational Telegram bot interface backed by an LLM for structured entity extraction.
- Deploying a low-latency FastAPI inference server for real-time predictions.