Photometric Redshift Estimation

PythonXGBoostScikit-learnK-MeansAstropy
Cover image for Photometric Redshift Estimation

This self-directed research project aimed to accurately estimate galactic redshifts, a crucial metric in astrophysics, using photometric data from the Sloan Digital Sky Survey (SDSS).

The Challenge

Photometric redshift estimation is a classic problem in astrophysics, often hampered by noisy data and the need for robust models. The goal was to build a pipeline that could outperform traditional methods by leveraging a hybrid machine learning approach.

My Solution

  • Architected a hybrid pipeline combining clustering (K-Means, GMM) with a stacked ensemble of boosting models (XGBoost, NGBoost).
  • Implemented a multi-method outlier detection strategy using Isolation Forest, LOF, and Z-score to clean the SDSS data.
  • Engineered several astronomical features, which proved crucial for model performance.
  • Achieved a high predictive accuracy with an R² score of 0.84 and an RMSE of 0.21, demonstrating the model's robustness.