Est. [2027] · [Toronto-ON]

MEIHUAN SHEN

IF you are new, Welcome!
IF you are a return visitor, Welcome back!
This is the protfolio of Michael shen, the content here is what is happened recently to me, and my progress on my projects.
This also acts as a personal website meant for any hiring manager, please hire me I am ready to work -v-!

Background

Who I Am

I am currely a third year student study in University of Toronto pursuing a degree in Applied statistics specialist with a math major.

I really like playing with data and getting familiar with it, like how you get to know someone from a complete stranger to a person that you can call your friend.

I am also planning in pursing a further study in a statistic master, Where? I still not sure yet :\
Currently open to work, or collaboration.

Craft

What I Do


I like working with data and when I tried out machine learning with nerual networks, language model.

I find that the more complexed models takes many of the exciting aspects of raw data away, which I priorities over. Currently non-parametric curve fitting with kernal density estimation of pdfs, emperical estimation of cdfs, doing statistic decision theory with bayes risk, or minmax risk is the most exciting part of my day.

I am currently reading on some non-convex optimization which hopfully leads to some work under the supervision of professor Mou.
This protfolio is also a part of my interest as I began to love UX/UI design when doing my melonbx project.

Beyond the Work

Outside Hours


I don't really label myself as an outdoors or indoors person, somedays I would go run a half marathon behind my house, and otherdays well.

But I understand the benefits of being outdoors so I push myself to be, and try to get active at least twice per week. my go to is a 5km run or some volleyball. tbh, I couldn't remember the last time I played with someone.

Here is my spotify page, I am also into survival games, project zomboid, vintage story. etc

4 Repositories · github.com/Ancientrains

PROJECTS

A collection of ML systems, statistical analyses, and data-driven tools — spanning computer vision, regression modelling, and real-world competition submissions.

Python · PyTorch · AWS · Flask

MelonBX

A deployed production ML API that predicts watermelon sweetness (Brix) from images. Built a multi-branch Siamese CNN achieving ~80% reduction in both RMSE and MAE. Collected 2,000+ ground-truth samples from Ontario farms, deployed on AWS EC2 with custom domain, automated SSL/TLS, CI/CD pipeline, and S3 persistence via IAM Instance Roles.

Jupyter Notebook · scikit-learn · 1st Place

SDSS Datathon

1st Place winner at the SDSS UofT Datathon 2026. Built a predictive pipeline for Toronto shelter capacity pressure using HistGradientBoostingRegressor on a 2-year dataset. Achieved a 144× improvement over baseline MAE. Produced geospatial heatmaps of the GTA and delivered stakeholder-facing recommendations for evidence-based resource allocation.

R · Statistical Modelling

Cancer Regression Model

A regression analysis applying multivariate statistical methods to cancer-related data. Explores model selection, diagnostic checks, and interpretation of results — documenting the full analytical process from data cleaning through to validated predictions.

Python · Linear Regression

Country Predictor

A single-file annotated notebook comparing three approaches to fitting linear regression — documenting the mathematical derivations, implementation differences, and performance trade-offs between OLS, gradient descent, and a third method side by side.

University of Toronto · Toronto, ON

EXPERIENCE

Research and competition experience applying statistical modelling, machine learning, and data engineering to real-world problems in healthcare, social services, and agriculture.

March 2025 · 48-Hour Competition · 1st Place

SDSS Datathon 2026

Team 44 — Model Builder & Data Visualization. Built a predictive pipeline for Toronto shelter capacity pressure using HistGradientBoostingRegressor; engineered occupancy-free features to prevent target leakage, one-hot encoded 6 categorical dimensions, and validated via 80/20 temporal split. Achieved 144× improvement over baseline MAE. Identified location as the primary pressure driver and presented findings via a GTA heatmap to stakeholders.

June 2025 – September 2025 · UofT Statistical Sciences

Undergraduate Researcher

Under Prof. Lijia Wang. Applied Bayesian hierarchical modelling and multivariate regression to neural time-series data to quantify consciousness recovery indicators in comatose patients. Validated models via cross-validation, residual analysis, and Box-Cox transformations. Processed 50 GB+ EEG datasets using parallel computing pipelines in Python, R, and MATLAB.

Expected June 2027 · Major CGPA 3.6/4.0

B.Sc. UofT

Specialist in Applied Statistical Science, Major in Mathematics. Art and Science Internship Program (Co-op). Coursework in Methods of Data Analysis, Theory of Statistical Practice, Data Structures, and Machine Learning. Competencies in Bayesian methods, GLMs, CNNs, gradient boosting, AWS data engineering, and REST API development.

Toronto, ON · Open to opportunities

CONTACT / NEWSLETTER

Want to collaborate, ask a question, or just say hello? Reach out — always happy to connect. Below is a running log of updates, sorted newest first.

Email

Get In Touch

The best way to reach me. I typically respond within a day or two.

Code

GitHub

All my public projects, experiments, and competition submissions live here.