Completed

HDB Resale Price Estimator

Telegram bot for HDB resale price prediction using ensemble ML & ARIMA

System Architecture Visual

Project Overview

A Telegram bot that predicts Singapore HDB resale flat prices using an ensemble of Gradient Boosted Tree models (CatBoost, LightGBM, XGBoost) combined with ARIMA for time-series trend forecasting. Users interact through a natural-language chat interface powered by an LLM, with predictions served via a FastAPI backend.

Methodology & Architecture

1. Data Collection & Feature Engineering

Ingesting historical HDB resale transaction data from data.gov.sg, enriched with proximity features such as distance to MRT stations, schools, hawker centres, and shopping malls.

2. Ensemble ML Modelling

Training CatBoost, LightGBM, and XGBoost regressors individually, then combining predictions via a stacking/blending ensemble to minimize RMSE and capture non-linear property price signals.

3. Time-Series Adjustment (ARIMA)

Applying an ARIMA model on residual price trends to account for macroeconomic seasonality and month-on-month price movements, blended with the GBT ensemble output.

4. FastAPI Backend

Serving the combined prediction pipeline through a RESTful FastAPI service, handling input validation, feature transformation, and model inference at low latency.

5. LLM-Powered Chat Module

Integrating an LLM to parse natural-language queries from Telegram users, extract structured flat parameters (town, flat type, storey, floor area), and return human-readable price estimates with explanations.

Technologies Used

CatBoost
LightGBM
XGBoost
ARIMA
FastAPI
LLM
Telegram API

Key Learnings

  • Combining gradient boosted trees with ARIMA for hybrid price forecasting.
  • Engineering geo-spatial proximity features from multiple public Singapore datasets.
  • Building a conversational Telegram bot interface backed by an LLM for structured entity extraction.
  • Deploying a low-latency FastAPI inference server for real-time predictions.