Photometric Redshift Estimation
PythonXGBoostScikit-learnK-MeansAstropy

This self-directed research project aimed to accurately estimate galactic redshifts, a crucial metric in astrophysics, using photometric data from the Sloan Digital Sky Survey (SDSS).
The Challenge
Photometric redshift estimation is a classic problem in astrophysics, often hampered by noisy data and the need for robust models. The goal was to build a pipeline that could outperform traditional methods by leveraging a hybrid machine learning approach.
My Solution
- Architected a hybrid pipeline combining clustering (K-Means, GMM) with a stacked ensemble of boosting models (XGBoost, NGBoost).
- Implemented a multi-method outlier detection strategy using Isolation Forest, LOF, and Z-score to clean the SDSS data.
- Engineered several astronomical features, which proved crucial for model performance.
- Achieved a high predictive accuracy with an R² score of 0.84 and an RMSE of 0.21, demonstrating the model's robustness.