Projects

Photography Agent (Fall 2025)

The Photography Agent is a collaborative semester project focused on building a photo organization and editing system powered by an agentic large language model (LLM). The LLM interprets natural language user requests and calls specialized computer vision tools for photo filtering and editing.

LLM Agent (Primary Contribution)

System Tools (Team Contributions)

Links

Demo of Photography Agent (link)


Diabetic Retinopathy Classification Preprocessing Evaluation (Spring 2024)

In this project, a partner and I investigated the impact of image preprocessing techniques on diabetic retinopathy (DR) severity classification using deep learning models, with the goal of determining whether a single preprocessing strategy generalizes across architectures.

My Contributions

Results

  • All three models were evaluated using accuracy, precision, recall, and F1-score across five preprocessing pipelines.
  • CLAHE + Gaussian filtering yielded the largest performance gains for ResNet-50 and EfficientNet relative to baseline preprocessing.
  • The DenseSwin model achieved its highest metrics using standard preprocessing.
Comparison of model performance across five image preprocessing methods.
Figure 1: Comparison of model performance across five image preprocessing methods. Bar plots show Accuracy, Precision, Recall, and F1-score for three models evaluated under each preprocessing technique.

Key Findings

  • Preprocessing effectiveness is architecture-dependent; contrast enhancement techniques improved CNN-based models (ResNet-50, EfficientNet), while the hybrid DenseSwin model achieved optimal performance with standard preprocessing.
  • No single preprocessing pipeline generalized best across all architectures, highlighting the need for joint optimization of model and input pipeline.

Links


Cell Type & Cancer Classification (Fall 2024)

This project applies deep learning to automated cell-type and cancer multi-classification using the CellNet medical imaging dataset. The models classify images into 19 cell types and distinguish benign cells from multiple cancer subtypes, with the goal of improving speed and reliability in medical image analysis.

Swin Transformer (Primary Contribution)

Results

Plot of SWIN transformer test and training accuracy
Figure 1: Training and validation accuracy curves of the Swin Transformer model across epochs. The model demonstrates progressive convergence, with validation accuracy closely tracking training accuracy, indicating effective generalization and minimal overfitting.
Plot of SWIN transformer test and training loss
Figure 2: Training and validation loss curves of the Swin Transformer model over epochs. The decreasing loss indicates stable convergence, with validation loss closely following training loss, suggesting effective generalization.

SWIN transformer confusion matrix
Figure 3: Confusion matrix of the Swin Transformer model on the test set, showing classification performance across 19 categories. Most predictions lie on the diagonal, indicating high accuracy, with few false positives or false negatives.

Baseline Models (Team Contributions)

View detailed results for baseline models (ResNet, EfficientNet, MLP)

The following models were evaluated during development but underperformed relative to the Swin Transformer in both weighted F1-score and convergence.

EfficientNet

EfficientNet confusion matrix
Figure 4: Confusion matrix of the EfficientNet model on the test set, showing classification performance across 19 categories.
  • Achieved a test accuracy of 80.2%, but a very low weighted F1 score of 0.069, indicating strong class imbalance effects and poor per-class performance.
  • The confusion matrix reveals extensive misclassification across classes, including a strong bias toward the dominant class as well as a high number of false negatives within the majority class itself, indicating poor class separation overall.

Multilayer Perceptron

MLP confusion matrix
Figure 5: Confusion matrix of the Multilayer Perceptron model on the test set, showing classification performance across 19 categories.
  • Achieved a test accuracy of 49.0% with a weighted F1 score of 0.576, indicating limited overall performance but improved class-level balance compared to EfficientNet.
  • The confusion matrix shows greater concentration along the diagonal than EfficientNet, suggesting better handling of class imbalance, though a substantial number of false positives and false negatives remain.

ResNet50

ResNet confusion matrix
Figure 6: Confusion matrix of the ResNet50 model on the test set, showing classification performance across 19 categories.
  • Achieved a test accuracy of 83.9% and a weighted F1 score of 0.826, demonstrating improved performance over EfficientNet and MLP.
  • The confusion matrix exhibits strong diagonal alignment, with most errors concentrated between visually similar classes (e.g., skin_benign vs. skin_melanoma), suggesting remaining challenges in fine-grained visual discrimination.

Key Findings

Links


Capstone-level R&D project

(project details)