Running Image Classification with EfficientNet
In this post, I’ll walk you through how I set up and ran image classification using a pre-trained EfficientNet model in TensorFlow.
I have recently collected all my photos from a cloud storage service, and old phone, and various other places. I would like categorise or tag them, as a way of helping me organise and find things.
Not being keen to give a big tech company all of them, I have decided to build a tool I can use, based on the EfficientNet artificial neural network, which I can run locally on an RTX 3060 GPU.
This was my first step in learning how to use AI for image recognition, with the ultimate goal culling and organising my photo collection. This project serves as a foundation for more advanced applications in the future.
Background
EfficientNet is a family of convolutional neural networks that achieves state-of-the-art accuracy while being computationally efficient. The model I used, EfficientNetB0, is pre-trained on the ImageNet dataset, which allows it to recognise a wide range of objects out of the box.
The goal of this project was simple: use EfficientNet to classify an image of a cat and identify its breed or related categories. This involved setting up a development environment, configuring TensorFlow to use my NVIDIA RTX 3060 GPU, and writing Python code for inference.
Setting Up the Environment
1. Installing Python and TensorFlow
I started by ensuring Python was installed on my machine. Since I’m using Ubuntu 24.10, my Python version is 3.12. To avoid interfering with the system-wide Python environment, I created a virtual environment:
python3 -m venv ~/workspace/ai/venv
source ~/workspace/ai/venv/bin/activate
pip install --upgrade pip
Next, I installed TensorFlow with GPU support:
pip install tensorflow[and-cuda]
This ensured that TensorFlow would take advantage of my RTX 3060 for faster computations.
2. Verifying GPU Usage
To confirm that TensorFlow could detect my GPU, I ran the following script:
import tensorflow as tf
print("GPUs detected:", tf.config.list_physical_devices('GPU'))
The output showed that TensorFlow recognised my RTX 3060, so I was ready to proceed.
Writing the Inference Code
The next step was writing a Python script to load an image and classify it using EfficientNetB0. Here’s the complete code:
import numpy as np
import tensorflow as tf
from tensorflow.keras.applications import EfficientNetB0
from tensorflow.keras.applications.efficientnet import preprocess_input, decode_predictions
from tensorflow.keras.preprocessing import image
# Ensure TensorFlow uses the GPU
gpus = tf.config.list_physical_devices('GPU')
if gpus:
tf.config.set_visible_devices(gpus[0], 'GPU')
print("Using GPU:", gpus[0])
else:
print("No GPU detected. Using CPU.")
# Load the pre-trained EfficientNetB0 model
model = EfficientNetB0(weights='imagenet')
# Path to the sample image
img_path = 'sample_image.png'
# Load and preprocess the image
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
# Perform inference
predictions = model.predict(x)
decoded_predictions = decode_predictions(predictions, top=3)[0]
# Print the results
for i, (imagenet_id, label, score) in enumerate(decoded_predictions):
print(f"{i + 1}: {label} ({score:.2f})")
This script does the following:
- Configures TensorFlow to use the GPU
- Loads the EfficientNetB0 model pre-trained on ImageNet
- Preprocesses the input image to meet the model’s requirements (e.g., resizing to 224x224 pixels and normalizing pixel values)
- Runs inference and decodes the top-3 predictions
Results
When I ran the script with an image of a cat, the output was:
1: tiger_cat (0.25)
2: tabby (0.21)
3: Egyptian_cat (0.09)
EfficientNet correctly identified the image as related to cats, with tiger_cat
as the most likely category. This was
a great validation of the setup and the model’s capabilities.
Lessons Learned
- Environment Setup: Using a virtual environment and ensuring TensorFlow had GPU support were the first steps. Debugging GPU issues taught me how to configure CUDA correctly.
- Inference Basics: Understanding how to preprocess images and decode model outputs gave me a solid foundation for working with pre-trained models.
- Efficiency: Running inference on the GPU significantly improved performance compared to the CPU. The RTX 3060 GPU I am using is sufficient for this task.
Next Steps
Now that I’ve successfully run inference, I plan to make a start on training. Initially, I will train EfficientNet on a dataset known as “Cats vs. Dogs.” Eventually, I will use what I have learned to organise my personal photo collection into categories automatically. This will help me prune and organise the collection.