Camera-Based Perception
Visual sensing and basic image processing techniques
Tutorial Overview
๐ฏ Learning Objectives
By the end of this tutorial, you will:
โ Access and view camera streams
โ Understand image topics and formats
โ Perform basic image processing (filters, thresholding, edge detection)
โ Detect colors and shapes
โ Record and process video data
โ Understand camera calibration
โ Explore potential perception applications
โ Integrate vision with robot control
โฑ๏ธ Time Required
Reading & Setup: 20 minutes
Basic Image Access: 25 minutes
Image Processing: 40 minutes
Color/Shape Detection: 35 minutes
Applications: 30 minutes
Total: ~150 minutes
๐ Prerequisites
โ Completed Sensor Data Visualization
โ Completed ROS2 Communication & Tools
โ Basic Python knowledge
โ Understanding of image concepts (pixels, RGB)
โ OpenCV basics helpful but not required
โ Can record and play rosbags
๐ ๏ธ What You'll Need
โ Beetlebot (powered, camera working)
โ Laptop with ROS2 Jazzy
โ Wireless controller
โ Test objects with distinct colors
โ Good lighting (avoid direct sunlight on camera)
โ Printed patterns for detection tests (optional)
Part 1: Camera System Overview
Raspberry Pi Camera V1.3 Specifications
Your robot's camera:
Hardware:
Sensor: OmniVision OV5647
Resolution: 5MP (2592ร1944 max)
Typical streaming: 1080p (1920ร1080) or 720p (1280ร720)
Frame rate: 30 fps (1080p), 60 fps (720p)
FOV: ~54ยฐ horizontal, ~41ยฐ vertical
Focus: Fixed (best 30cm - 3m)
Interface: CSI (Camera Serial Interface) to Pi
Mounting:
Height: 8.8cm from ground
Position: 15cm forward from center
Orientation: Horizontal, forward-facing
Topic:
/pi_camera/image_raw
Camera Topics
Available topics:
Topic details:
/pi_camera/image_raw (sensor_msgs/Image)
Uncompressed RGB8 or BGR8 format
~30 MB/s bandwidth (1080p @ 30fps)
Best quality
Use for local processing
/pi_camera/image_raw/compressed (sensor_msgs/CompressedImage)
JPEG compression
~3-5 MB/s bandwidth
Slight quality loss
Better for remote viewing
/pi_camera/camera_info (sensor_msgs/CameraInfo)
Intrinsic parameters (focal length, principal point)
Distortion coefficients
Rectification matrix
Projection matrix
Understanding Image Messages
Image message structure:
Common encodings:
rgb8: 8-bit RGB (Red-Green-Blue)bgr8: 8-bit BGR (OpenCV default)mono8: 8-bit grayscalergba8: 8-bit RGBA (with alpha channel)
Part 2: Accessing Camera Stream
Quick View with rqt_image_view
Simplest method:
Controls:
Zoom: Mouse wheel
Pan: Right-click + drag
Refresh: Click dropdown again
Rotate: Image โ Transform menu
[PLACEHOLDER: Screenshot of rqt_image_view showing camera feed]
Exercise 9.1: Camera Field of View Test
Task: Measure camera's actual field of view
Materials needed:
Measuring tape
Flat wall
Markers or tape
Procedure:
Viewing in RViz
Integrated visualization:
Benefits of RViz camera view:
See camera + LiDAR + map simultaneously
Overlay detection results
3D context for camera view
Can add measurement tools
Part 3: Basic Image Processing with OpenCV
Installing Dependencies
Creating Image Processing Node
Basic template for image processing:
Script:
Run:
Two windows appear:
Original: Color camera feed
Grayscale: Converted to grayscale
Press Ctrl+C to stop
Image Filtering
Add filters to processing node:
[PLACEHOLDER: Screenshot showing original vs filtered images]
Exercise 9.2: Edge Detection for Obstacle Boundaries
Task: Detect obstacles using edge detection
Modify processing node:
Test:
Place various objects in front of robot
Observe edge detection outlining objects
Try different Canny thresholds (50, 150) to tune sensitivity
Part 4: Color Detection
HSV Color Space
Why HSV instead of RGB?
RGB (Red-Green-Blue):
How cameras capture
Lighting affects all channels
Hard to isolate specific colors
HSV (Hue-Saturation-Value):
Hue = Color (0-180ยฐ in OpenCV)
Saturation = Color intensity (0-255)
Value = Brightness (0-255)
Lighting mainly affects Value
Easier to detect specific colors
Color ranges in HSV (OpenCV):
Color Detection Node
Create color detector:
Script:
Run:
Exercise 9.3: Color-Based Object Tracking
Task: Track colored object and display centroid
Modification to color_detector.py:
Test:
Hold red object in front of camera
Move it around
Observe centroid tracking
Check console for position logs
Part 5: Shape Detection
Detecting Simple Shapes
Add shape detection:
Exercise 9.4: Shape and Color Recognition
Task: Create multi-property detector
Test scenario:
Part 6: Camera Calibration
Why Calibrate?
Camera distortion:
Lens distortion (barrel, pincushion)
Affects measurements and 3D perception
Need calibration to correct
What calibration provides:
Intrinsic matrix (focal length, principal point)
Distortion coefficients
Allows undistortion of images
Enables accurate 3D measurements
Viewing Current Calibration
Using Calibration Data
Undistort image:
Part 7: Practical Applications
Application 1: Line Following
Detect line on floor, steer toward it:
Use error for steering:
Application 2: Obstacle Color Classification
Classify obstacles by color, react differently:
Application 3: AprilTag Detection
AprilTags = Fiducial markers (like QR codes for robots)
Install apriltag library:
Launch apriltag detector:
Use cases:
Precise localization (know exact position from tag)
Object identification (each tag has unique ID)
Docking stations (navigate to specific tag)
AR applications (overlay virtual objects)
Part 8: Recording and Processing Video
Recording Camera Data
Offline Processing
Process recorded video:
Run:
Part 9: Limitations and Considerations
What Camera Perception Can't Do (Without ML)
โ ๏ธ Important: Basic image processing has limitations
Without machine learning:
โ Cannot do:
Object recognition (identify specific objects like "person", "chair", "dog")
Face recognition
Text reading (OCR - though basic, possible with libraries)
Complex scene understanding
Semantic segmentation
โ Can do:
Color detection
Shape detection (simple geometries)
Edge/contour detection
Motion detection
Line following
Marker detection (AprilTags, QR codes)
Brightness/contrast analysis
When to Use Machine Learning
For advanced perception, need ML models:
Options:
Pre-trained models (YOLO, MobileNet, etc.)
Object detection
Instance segmentation
Pose estimation
Custom training (TensorFlow, PyTorch)
Train on specific objects
Requires dataset collection
Computationally intensive
Note: ML perception is beyond this tutorial but worth exploring!
Lighting Considerations
Camera performance varies with lighting:
Good lighting:
Indirect, even illumination
No harsh shadows
Consistent throughout environment
Poor lighting:
Direct sunlight (overexposes)
Backlit scenes (subject too dark)
Rapid changes (drives in/out of shadows)
Very low light (noisy images)
Tip: Test perception algorithms in actual target lighting conditions!
Part 10: Knowledge Check
Concept Quiz
Why use HSV instead of RGB for color detection?
What does camera calibration provide?
What's the difference between /image_raw and /image_raw/compressed?
Can basic OpenCV detect "what" an object is (e.g., identify as "cup")?
Why is the camera mounted forward-facing at 8.8cm height?
Hands-On Challenge
Task: Color-based navigation system
Requirements:
Robot follows green markers on floor
Stops at red markers
Ignores other colors
Publishes cmd_vel based on vision
Displays annotated video showing detections
Records demo video
Deliverable:
Python script with vision processing
Video recording of robot navigating course
Documentation of color thresholds used
Discussion of challenges encountered
Bonus:
Add shape detection (only respond to green circles, not rectangles)
Implement smooth steering (PID control based on centroid error)
Handle cases where no marker visible
Part 11: What You've Learned
โ
Congratulations!
You now understand:
Camera Fundamentals:
โ Camera specifications and mounting
โ Image topics and message formats
โ Viewing camera streams (rqt, RViz)
โ Recording and playing back video
Image Processing:
โ OpenCV basics (CvBridge, cv2 functions)
โ Image filtering (blur, edge detection)
โ Color detection (HSV color space)
โ Shape detection (contours, polygons)
Practical Applications:
โ Line following
โ Obstacle classification by color
โ Object tracking (centroids)
โ Marker detection (AprilTags)
Advanced Topics:
โ Camera calibration concepts
โ Undistorting images
โ Offline video processing
โ Limitations of basic vision
Next Steps
๐ฏ You're Now Ready For:
Immediate Next: โ Localization Techniques - Combine vision with other sensors
Autonomous Navigation: โ Autonomous Navigation - Vision-aided obstacle avoidance
Advanced Vision:
Machine learning object detection (YOLO, MobileNet)
Visual odometry (estimate motion from camera)
SLAM with visual features (ORB-SLAM)
Depth estimation (if adding depth camera)
Quick Reference
Essential Camera Commands
OpenCV Quick Reference
HSV Color Ranges (OpenCV)
Red
0-10, 170-180
100-255
100-255
Orange
10-25
100-255
100-255
Yellow
25-35
100-255
100-255
Green
35-85
50-255
50-255
Blue
85-125
50-255
50-255
Purple
125-155
50-255
50-255
Completed Camera-Based Perception! ๐
โ Continue to Localization Techniques โ Or return to Tutorial Index
Last Updated: January 2026 Tutorial 9 of 11 - Advanced Level Estimated completion time: 150 minutes
Last updated