Part 7: Computer Vision with OpenCV and Deep Learning
👁️ What Is Computer Vision?
Computer Vision (CV) enables machines to “see” and interpret visual data. It powers:
- Face detection & recognition
- Object recognition & tracking
- Medical imaging & diagnostics
- Self-driving cars & autonomous navigation
- Industrial automation & defect detection
Tools We Will Use
- OpenCV: Image processing and manipulation
- TensorFlow/Keras: Build CNN models for classification
- Pre-trained Models: Fast, accurate image recognition using MobileNet, VGG16, ResNet
Install libraries:
pip install opencv-python tensorflow matplotlib scikit-learn
python -m pip install --upgrade pip
🖼️ Step-by-Step: Image Classification with CNN
Step 1: Load & Preprocess CIFAR-10 Dataset
CIFAR-10 is a real-world dataset of 60,000 32x32 color images in 10 classes.
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
import matplotlib.pyplot as plt
# Load dataset
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
X_train, X_test = X_train / 255.0, X_test / 255.0
# Display sample image
plt.imshow(X_train[0])
plt.title(f"Label: {y_train[0][0]}")
plt.axis('off')
plt.show()
Step 2: Build a Convolutional Neural Network
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
model = Sequential([
Conv2D(32, (3,3), activation='relu', input_shape=(32,32,3)),
MaxPooling2D((2,2)),
Conv2D(64, (3,3), activation='relu'),
MaxPooling2D((2,2)),
Flatten(),
Dense(128, activation='relu'),
Dropout(0.3),
Dense(10, activation='softmax')
])
Step 3: Compile & Train the Model
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
history = model.fit(
X_train, y_train,
epochs=15,
batch_size=64,
validation_split=0.1
)
Step 4: Evaluate Performance
test_loss, test_acc = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {test_acc*100:.2f}%")
Step 5: Visualize Training History
import matplotlib.pyplot as plt
# Accuracy plot
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Accuracy Over Epochs')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
# Loss plot
plt.plot(history.history['loss'], label='Train Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Loss Over Epochs')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.show()
Step 6: Predict Custom Images with OpenCV
import cv2
import numpy as np
image = cv2.imread('your_image.jpg')
resized = cv2.resize(image, (32, 32)) / 255.0
reshaped = resized.reshape(1, 32, 32, 3)
prediction = model.predict(reshaped)
print("Predicted class:", prediction.argmax())
Tip: Make sure your custom image is similar in size and format as your training images.
Step 7: Use Pretrained Model (MobileNetV2)
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input, decode_predictions
from tensorflow.keras.preprocessing import image
# Load model
pretrained_model = MobileNetV2(weights='imagenet')
# Prepare image
img = image.load_img('your_image.jpg', target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
# Predict
preds = pretrained_model.predict(x)
print("Predicted:", decode_predictions(preds, top=1)[0])
📌 Bonus: Real-World Datasets
- Fashion-MNIST: Clothing item classification
- CelebA: Celebrity face recognition
- COCO Dataset: Multi-object detection and segmentation
- MNIST Digits: For beginner-friendly digit recognition
💡 Practice Challenges
- Try other datasets like CelebA, COCO, or Fashion-MNIST
- Detect faces using
cv2.CascadeClassifier - Implement edge detection, blurring, and contour detection with OpenCV
- Build a real-time webcam object classifier
- Use data augmentation to increase dataset size
⚙️ Hyperparameter Tuning Ideas
| Hyperparameter | Options |
|---|---|
| Filters in Conv2D | 32, 64, 128 |
| Dense layer units | 64, 128, 256 |
| Dropout rate | 0.1–0.5 |
| Batch size | 32, 64, 128 |
| Learning rate | 0.001, 0.0005, 0.0001 |
| Activation function | relu, tanh, gelu |
Advanced Topics (For Ambitious Learners)
- Convolutional Neural Networks (CNNs) - deeper architectures like ResNet, VGG16
- Transfer Learning - leverage pretrained weights for faster results
- Image Augmentation - rotate, flip, zoom, shear images to improve generalization
- Object Detection - YOLO, SSD for real-time detection
- Segmentation - Mask R-CNN for pixel-level classification
- Batch Normalization & Regularization - stabilize and speed up training
🛠️ Mini-Projects You Can Build Next
- Face mask detection using webcam
- Real-time traffic sign classifier
- Pet breed recognition app
- Fruit quality detection for agriculture
- Medical X-ray abnormality detection
🎓 What You’ve Learned:
- How to preprocess and classify images using CNNs
- Using OpenCV for custom image processing
- Applying pretrained models for instant predictions
- Working with real-world datasets
- Advanced CV techniques for better performance
📝 Computer Vision Cheat Sheet
# Layers
Conv2D(filters, kernel, activation) - Convolution
MaxPooling2D(pool_size) - Pooling
Flatten() - Flatten layer
Dense(units, activation) - Fully connected
Dropout(rate) - Prevent overfitting
# Activation Functions
relu, tanh, softmax
# Compilation
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
# Training
model.fit(X_train, y_train, epochs=10, batch_size=64, validation_split=0.1)
# Evaluation
test_loss, test_acc = model.evaluate(X_test, y_test)
# Prediction
predictions = model.predict(X_test)
np.argmax(predictions[i])
# OpenCV Tips
cv2.imread('image.jpg')
cv2.resize(img, (width, height))
cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
❓ FAQs
1. Can I classify my own images?
Yes, resize images to match training data, normalize pixel values, and use model.predict() or a pretrained model.
2. Which dataset is best for learning CV?
Start with CIFAR-10 or Fashion-MNIST for beginners, CelebA or COCO for advanced projects.
3. How can I improve accuracy?
Use deeper CNNs, data augmentation, transfer learning, dropout, and batch normalization.
4. Do I need a GPU?
GPU accelerates training but small models can run on CPU.
5. Can OpenCV detect faces in real-time?
Yes, using Haar cascades or DNN-based detectors, you can process webcam streams live.
📢 Call to Action
If you enjoyed this tutorial, share it with friends, try the practice challenges, and comment your results. Subscribe to continue learning AI and explore Reinforcement Learning in Part 8!
🧭 What’s Next?
In Part 8, we’ll explore Reinforcement Learning (RL) agents learning via trial and error using rewards and penalties.