Image Classification App with EfficientNet
The goal of this project is to classify all the images I had from previous phones, cloud backups etc. The project will be a local web application that acts as a photo album, with EfficientNet/TensorFlow running locally to do image classification. There are 15k images, and I can classify them locally on an RTX 3060 GPU.
This post in the series details the system design. As this was my first project using Cursor, I wanted to specify the system design clearly, and see how much assistance it was able to give with the coding. The below is what I have prompted Cursor.
We are building a local-only web application for image classification with the following components:
1. Docker Setup
The application will use Docker and Docker Compose for containerization, ensuring clear separation of services and ease of deployment. Key points:
- Containers:
- Frontend: React-based container for the user interface.
- Backend: Go-based API container for managing metadata and serving filtered/sorted images.
- Machine Learning: Python-based container for TensorFlow image classification with GPU support.
- Message queue: communication between backend and ML containers
- Volume Mounts:
- A specified directory external to the project containing the images is mounted as read-only for the Go and Python containers.
- A volume is mounted for the SQLite database file, allowing read/write access by the Go container.
- GPU Support:
- The Python ML container uses
nvidia/cuda:12.6.3-cudnn-runtime-ubuntu24.04
as its base image, leveraging GPU acceleration via NVIDIA Container Toolkit.
- The Python ML container uses
2. Frontend (React)
A user-friendly interface with the following features:
- View All Photos: Display all images from the mounted directory.
- Metadata Display: Show metadata (e.g., categories) retrieved from the SQLite database via the backend.
- Sorting and Filtering:
- Sort images by date.
- Filter images by category.
- Album Management:
- Create, update, and delete albums (collections of photos).
- Albums and their associated photos are stored in the database.
3. Backend (Go)
The backend serves as the API for the application, with the following responsibilities:
- Database Management:
- Manage metadata in an SQLite database, including relationships between images, categories, and albums.
- API Endpoints:
- Serve metadata and filtered/sorted image lists to the frontend.
- Support CRUD operations for albums and categories.
- CLI Features:
- Database Migrations: Apply and manage migrations for the SQLite database.
- ML Query: Trigger the Python ML container via the message queue, send image classification tasks, and retrieve classification results to update the database.
4. Machine Learning (Python)
A Python-based service handles image classification using TensorFlow and EfficientNet. Key features:
- Image Classification:
- Listens to a message queue (MQ) for image classification tasks sent by the Go backend.
- Processes a batch of image file paths at a time and applies the specified model.
- Sends the classification results back to the MQ for the Go backend to retrieve and update the database.
- GPU Support:
- The container uses the base image
nvidia/cuda:12.6.3-cudnn-runtime-ubuntu24.04
and TensorFlow’s GPU version.
- The container uses the base image
- Volume Mounts:
- Read-only access to the external image directory.
- No direct database access (handled by the Go backend).
5. Database (SQLite)
The database manages image metadata and application data:
- Tables:
- Images: File paths, timestamps, and other image-related metadata.
- Categories: Many-to-many relationships between images and categories.
- Albums: Collections of images created by users.
- Models: Track the classifiers used for categorising images.
- Data Integrity:
- The database is mounted as a file accessible only by the backend container.
- Excluded from version control to ensure privacy.
6. Message Queue (MQ)
The application will use a message queue to facilitate communication between the Go backend and the Python ML container. Key points:
- Choice of MQ: RabbitMQ is chosen for its reliability and robust support for task queuing.
- Purpose:
- The Go backend sends image classification tasks to the Python container through the MQ.
- The Python container processes the tasks, classifies the images, and sends the results back to the MQ.
- The Go backend retrieves the results from the MQ and persists them in the SQLite database.
- Scalability: This design supports asynchronous processing and can scale to handle large batches of images efficiently.
Workflow
- Image Viewing: The user views all images in the photo directory through the frontend.
- Metadata Management: The backend fetches and serves metadata to the frontend from the SQLite database.
- Image Classification:
- The Go CLI triggers the ML service, passing image paths via the message queue.
- The ML service classifies the images and updates the database with categories.
- Album Management: Users create and manage albums through the frontend, with changes reflected in the database.
Goals
- Local-only deployment for privacy and security.
- Scalability to handle ~15,000 images.
- GPU-accelerated classification for efficient batch processing.