IF you are new, Welcome!
IF you are a return visitor, Welcome back!
This is the protfolio of Michael shen, the content here is what is happened recently to me, and my progress on my projects.
This also acts as a personal website meant for any hiring manager, please hire me I am ready to work -v-!
I am currely a third year student study in University of Toronto pursuing a degree in Applied statistics specialist with a math major.
I really like playing with data and getting familiar with it, like how you get to know someone from a complete stranger to a person that you can call
your friend.
I am also planning in pursing a further study in a statistic master, Where? I still not sure yet :\
Currently open to work, or collaboration.
I like working with data and when I tried out machine learning with nerual networks, language model.
I find that the more complexed models takes many of the
exciting aspects of raw data away, which I priorities over. Currently non-parametric curve fitting with kernal density estimation of pdfs, emperical estimation of cdfs,
doing statistic decision theory with bayes risk, or minmax risk is the most exciting part of my day.
I am currently reading on some non-convex optimization which hopfully leads
to some work under the supervision of professor Mou.
This protfolio is also a part of my interest as I began to love UX/UI design when doing my melonbx project.
I don't really label myself as an outdoors or indoors person, somedays I would go run a half marathon behind my house, and otherdays well.
But I understand the benefits of being outdoors so I push myself to be, and try to get active at least twice per week. my go to is a 5km run or some volleyball.
tbh, I couldn't remember the last time I played with someone.
Here is my spotify page, I am also into survival games, project zomboid, vintage story. etc
A collection of ML systems, statistical analyses, and data-driven tools — spanning computer vision, regression modelling, and real-world competition submissions.
A deployed production ML API that predicts watermelon sweetness (Brix) from images. Built a multi-branch Siamese CNN achieving ~80% reduction in both RMSE and MAE. Collected 2,000+ ground-truth samples from Ontario farms, deployed on AWS EC2 with custom domain, automated SSL/TLS, CI/CD pipeline, and S3 persistence via IAM Instance Roles.
1st Place winner at the SDSS UofT Datathon 2026. Built a predictive pipeline for Toronto shelter capacity pressure using HistGradientBoostingRegressor on a 2-year dataset. Achieved a 144× improvement over baseline MAE. Produced geospatial heatmaps of the GTA and delivered stakeholder-facing recommendations for evidence-based resource allocation.
A regression analysis applying multivariate statistical methods to cancer-related data. Explores model selection, diagnostic checks, and interpretation of results — documenting the full analytical process from data cleaning through to validated predictions.
A single-file annotated notebook comparing three approaches to fitting linear regression — documenting the mathematical derivations, implementation differences, and performance trade-offs between OLS, gradient descent, and a third method side by side.
Research and competition experience applying statistical modelling, machine learning, and data engineering to real-world problems in healthcare, social services, and agriculture.
Team 44 — Model Builder & Data Visualization. Built a predictive pipeline for Toronto shelter capacity pressure using HistGradientBoostingRegressor; engineered occupancy-free features to prevent target leakage, one-hot encoded 6 categorical dimensions, and validated via 80/20 temporal split. Achieved 144× improvement over baseline MAE. Identified location as the primary pressure driver and presented findings via a GTA heatmap to stakeholders.
Under Prof. Lijia Wang. Applied Bayesian hierarchical modelling and multivariate regression to neural time-series data to quantify consciousness recovery indicators in comatose patients. Validated models via cross-validation, residual analysis, and Box-Cox transformations. Processed 50 GB+ EEG datasets using parallel computing pipelines in Python, R, and MATLAB.
Specialist in Applied Statistical Science, Major in Mathematics. Art and Science Internship Program (Co-op). Coursework in Methods of Data Analysis, Theory of Statistical Practice, Data Structures, and Machine Learning. Competencies in Bayesian methods, GLMs, CNNs, gradient boosting, AWS data engineering, and REST API development.
Want to collaborate, ask a question, or just say hello? Reach out — always happy to connect. Below is a running log of updates, sorted newest first.
The best way to reach me. I typically respond within a day or two.
All my public projects, experiments, and competition submissions live here.