The goal of this project is to classify all the images I had from previous phones, cloud backups etc. The project will be a local web application that acts as a photo album, with EfficientNet/TensorFlow running locally to do image classification. There are 15k images, and I can classify them locally on an RTX 3060 GPU.
Rewriting the Backend API in Ruby
Recently, I decided to rewrite the backend API using Ruby. While this wasn’t strictly necessary, it was an intentional decision to revisit Ruby after some time away from the language. It’s possible I may be encountering it in other projects, and I wanted to keep my skills fresh.
The Stack
The backend API is now a Rails application, configured as an API-only app, with comprehensive specs to ensure reliability. It connects to a React frontend, where users can curate photo albums from their collection.
On the backend, RabbitMQ is used to handle asynchronous ML tasks, principally image classification. The classification itself is powered by a Python-based machine learning container, which uses GPU acceleration for efficient processing.
Key Features
1. Rails API
The Rails framework provided an excellent base for building a RESTful API. Highlights:
- API-only Mode: This keeps the app lightweight, focusing solely on delivering data to the frontend.
- RSpec Tests: Comprehensive specs to ensure the API is robust and maintainable.
RSpec has played a significant role in the Ruby ecosystem, not only as a unit testing tool but also as an element of the invention of Behaviour-Driven Development. Its influence can be seen in modern testing practices across various programming languages.
2. Integration with React
The React frontend allows the user to create and manage photo albums seamlessly. The API handles data retrieval, and updates for curated albums.
3. RabbitMQ for Asynchronous Tasks
classification images is resource-intensive and can take time. To avoid blocking the user experience, the API writes a message to a RabbitMQ queue whenever a batch of images needs categorisation.
This design ensures that the user gets instant feedback (e.g., “Image queued for categorisation”), while the heavy lifting happens in the background. Early tests suggest as little as 25s/1000 images, but it remains to be seen if classification quality depends on the extent to which the image is reduced in size.
4. Machine Learning with Docker and GPU Acceleration
For image categorization, the RabbitMQ worker triggers a Python-based ML container. Here’s how the setup works:
- Dockerised ML Environment: The Python code runs inside a Docker container for portability and isolation.
- GPU Passthrough: Leveraging the desktop’s RTX 3060 GPU - the container uses NVIDIA’s Docker toolkit to access the GPU efficiently.
- Model: The ML model (e.g., EfficientNet) classifies images, tagging them with relevant categories.
Why Ruby?
- Revisiting an Old Favorite: Ruby has always had an elegant syntax and a developer-friendly ecosystem. Revisiting it after so long was fun.
- Rails’ Productivity: Rails, even after all these years, remains one of the most productive frameworks for building quick APIs.
- Simplicity: As a small, self-contained backend service, there weren’t any complex requirements for scalability or concurrency, so any language which is easy to read quickly is fine.
Next Steps
- Expand ML Models: Experiment with different models to improve image classification accuracy.
- Enhance Front-End: Add more features to the React app, like advanced search and filtering for photo albums.
- Optimise Performance: Explore further optimisations for RabbitMQ and the ML pipeline.