deep_learning_assignment

Assignment 1: Image Classification

Dataset: Food-101 It contains images of food, organized by type of food. It was used in the Paper “Food-101 – Mining Discriminative Components with Random Forests” by Lukas Bossard, Matthieu Guillaumin and Luc Van Gool. It’s a good (large dataset) for testing computer vision techniques. This document summarizes the workflow implemented in source/assignment_1/image/image.ipynb.

Exploratory Data Analysis (EDA)

The notebook first validates dataset structure after downloading from Kaggle, then performs quick visual inspection and descriptive analysis:

Random Sample Label Distribution

Data Preparation

For faster experimentation, the workflow trains on 10 selected labels.

This keeps class balance across splits and provides a consistent pipeline for both models.

Dataset Preparation

Model Summary

Two transfer-learning models are trained and compared:

1. ResNet Branch

Training Settings

Training is managed by a shared Trainer class.

The notebook trains both branches sequentially and saves best checkpoints.

Training curves

Results & Comparison

The notebook compares both models using:

This provides a direct baseline comparison between CNN-based and Transformer-based image classifiers on the same 10-label subset. Comparison Chart Metric table

Analysis & Discussion

This experiment highlights an important practical observation: the ViT branch converges faster and achieves stronger performance than the ResNet branch within the same training budget, even though the configured ViT variant in this workflow uses fewer trainable parameters in the linear-probe setup.

1. Why ViT converges faster here

2. Why ViT can outperform ResNet with fewer trainable parameters

3. Why sampling is necessary for this assignment

In short, the sampling strategy is not only a speed optimization; it is a practical experimental design choice that makes model comparison feasible and reproducible.