Projects

Photography Agent (Fall 2025)

The Photography Agent is a collaborative semester project focused on building a photo organization and editing system powered by an agentic large language model (LLM). The LLM interprets natural language user requests and calls specialized computer vision tools for photo filtering and editing.

LLM Agent (Primary Contribution)

System Tools (Team Contributions)

Links

Demo of Photography Agent

Watch demo in new tab

Diabetic Retinopathy Classification Preprocessing Evaluation (Spring 2024)

In this project, a partner and I investigated the impact of image preprocessing techniques on diabetic retinopathy (DR) severity classification using deep learning models. To discover if there is a preprocessing method that is best accross models.

My Contributions

Results

  • Evaluated accuracy, precision, recall, and F1-scores of each model fine-tuned on data using each preprocessing method.
  • Plot shows that preprocessing effectiveness is model-dependent, with CLAHE + Gaussian filtering improving ResNet-50 and EfficientNet performance, while the hybrid DenseSwin model performed best with standard preprocessing.
  • This highlights the importance of architecture-aware preprocessing strategies for DR screening systems.

Links

Cell Type & Cancer Classification (Fall 2024)

This project applies deep learning to automated cell-type and cancer multi-classification using the CellNet medical imaging dataset. The models classify images into 19 cell types and distinguish benign cells from multiple cancer subtypes, with the goal of improving speed and reliability in medical image analysis.

Swin Transformer (Primary Contribution)

Plot of SWIN transformer test and training accuracy Plot of SWIN transformer test and training loss SWIN transformer confusion matrix

Baseline Models (Team Contributions)

View detailed results for baseline models (ResNet, EfficientNet, MLP)

The following models were evaluated during development but underperformed relative to the Swin Transformer in terms of weighted F1 score and convergence.

EfficientNet

EfficientNet confusion matrix
  • Achieved a test accuracy of 80.2%, but a very low weighted F1 score of 0.069, indicating strong class imbalance effects and poor per-class performance.
  • The confusion matrix reveals extensive misclassification across classes, including a strong bias toward the dominant class as well as a high number of false negatives within the majority class itself, indicating poor class separation overall.

Multilayer Perceptron

MLP confusion matrix
  • Achieved a test accuracy of 49.0% with a weighted F1 score of 0.576, indicating limited overall performance but improved class-level balance compared to EfficientNet.
  • The confusion matrix shows greater concentration along the diagonal than EfficientNet, suggesting better handling of class imbalance, though a substantial number of false positives and false negatives remain.

ResNet50

ResNet confusion matrix
  • Achieved a test accuracy of 83.9% and a weighted F1 score of 0.826, demonstrating improved performance over EfficientNet and MLP.
  • The confusion matrix exhibits strong diagonal alignment, with most errors concentrated between visually similar classes (e.g., skin_benign vs. skin_melanoma), suggesting remaining challenges in fine-grained visual discrimination.

Conclusion

Overall, this project demonstrates the advantages of transformer-based architectures over traditional convolutional and fully connected models for large-scale medical image classification.

Links

Capstone-level R&D project

(project details)