Camera-Based Perception

Visual sensing and basic image processing techniques


Tutorial Overview

๐ŸŽฏ Learning Objectives

By the end of this tutorial, you will:

  • โœ… Access and view camera streams

  • โœ… Understand image topics and formats

  • โœ… Perform basic image processing (filters, thresholding, edge detection)

  • โœ… Detect colors and shapes

  • โœ… Record and process video data

  • โœ… Understand camera calibration

  • โœ… Explore potential perception applications

  • โœ… Integrate vision with robot control

โฑ๏ธ Time Required

  • Reading & Setup: 20 minutes

  • Basic Image Access: 25 minutes

  • Image Processing: 40 minutes

  • Color/Shape Detection: 35 minutes

  • Applications: 30 minutes

  • Total: ~150 minutes

๐Ÿ“š Prerequisites

  • โœ… Completed Sensor Data Visualization

  • โœ… Completed ROS2 Communication & Tools

  • โœ… Basic Python knowledge

  • โœ… Understanding of image concepts (pixels, RGB)

  • โœ… OpenCV basics helpful but not required

  • โœ… Can record and play rosbags

๐Ÿ› ๏ธ What You'll Need

  • โœ… Beetlebot (powered, camera working)

  • โœ… Laptop with ROS2 Jazzy

  • โœ… Wireless controller

  • โœ… Test objects with distinct colors

  • โœ… Good lighting (avoid direct sunlight on camera)

  • โœ… Printed patterns for detection tests (optional)


Part 1: Camera System Overview

Raspberry Pi Camera V1.3 Specifications

Your robot's camera:

Hardware:

  • Sensor: OmniVision OV5647

  • Resolution: 5MP (2592ร—1944 max)

  • Typical streaming: 1080p (1920ร—1080) or 720p (1280ร—720)

  • Frame rate: 30 fps (1080p), 60 fps (720p)

  • FOV: ~54ยฐ horizontal, ~41ยฐ vertical

  • Focus: Fixed (best 30cm - 3m)

  • Interface: CSI (Camera Serial Interface) to Pi

Mounting:

  • Height: 8.8cm from ground

  • Position: 15cm forward from center

  • Orientation: Horizontal, forward-facing

  • Topic: /pi_camera/image_raw


Camera Topics

Available topics:

Topic details:

/pi_camera/image_raw (sensor_msgs/Image)

  • Uncompressed RGB8 or BGR8 format

  • ~30 MB/s bandwidth (1080p @ 30fps)

  • Best quality

  • Use for local processing

/pi_camera/image_raw/compressed (sensor_msgs/CompressedImage)

  • JPEG compression

  • ~3-5 MB/s bandwidth

  • Slight quality loss

  • Better for remote viewing

/pi_camera/camera_info (sensor_msgs/CameraInfo)

  • Intrinsic parameters (focal length, principal point)

  • Distortion coefficients

  • Rectification matrix

  • Projection matrix


Understanding Image Messages

Image message structure:

Common encodings:

  • rgb8: 8-bit RGB (Red-Green-Blue)

  • bgr8: 8-bit BGR (OpenCV default)

  • mono8: 8-bit grayscale

  • rgba8: 8-bit RGBA (with alpha channel)


Part 2: Accessing Camera Stream

Quick View with rqt_image_view

Simplest method:

Controls:

  • Zoom: Mouse wheel

  • Pan: Right-click + drag

  • Refresh: Click dropdown again

  • Rotate: Image โ†’ Transform menu

[PLACEHOLDER: Screenshot of rqt_image_view showing camera feed]


Exercise 9.1: Camera Field of View Test

Task: Measure camera's actual field of view

Materials needed:

  • Measuring tape

  • Flat wall

  • Markers or tape

Procedure:


Viewing in RViz

Integrated visualization:

Benefits of RViz camera view:

  • See camera + LiDAR + map simultaneously

  • Overlay detection results

  • 3D context for camera view

  • Can add measurement tools


Part 3: Basic Image Processing with OpenCV

Installing Dependencies


Creating Image Processing Node

Basic template for image processing:

Script:

Run:

Two windows appear:

  • Original: Color camera feed

  • Grayscale: Converted to grayscale

Press Ctrl+C to stop


Image Filtering

Add filters to processing node:

[PLACEHOLDER: Screenshot showing original vs filtered images]


Exercise 9.2: Edge Detection for Obstacle Boundaries

Task: Detect obstacles using edge detection

Modify processing node:

Test:

  • Place various objects in front of robot

  • Observe edge detection outlining objects

  • Try different Canny thresholds (50, 150) to tune sensitivity


Part 4: Color Detection

HSV Color Space

Why HSV instead of RGB?

RGB (Red-Green-Blue):

  • How cameras capture

  • Lighting affects all channels

  • Hard to isolate specific colors

HSV (Hue-Saturation-Value):

  • Hue = Color (0-180ยฐ in OpenCV)

  • Saturation = Color intensity (0-255)

  • Value = Brightness (0-255)

  • Lighting mainly affects Value

  • Easier to detect specific colors

Color ranges in HSV (OpenCV):


Color Detection Node

Create color detector:

Script:

Run:


Exercise 9.3: Color-Based Object Tracking

Task: Track colored object and display centroid

Modification to color_detector.py:

Test:

  • Hold red object in front of camera

  • Move it around

  • Observe centroid tracking

  • Check console for position logs


Part 5: Shape Detection

Detecting Simple Shapes

Add shape detection:


Exercise 9.4: Shape and Color Recognition

Task: Create multi-property detector

Test scenario:


Part 6: Camera Calibration

Why Calibrate?

Camera distortion:

  • Lens distortion (barrel, pincushion)

  • Affects measurements and 3D perception

  • Need calibration to correct

What calibration provides:

  • Intrinsic matrix (focal length, principal point)

  • Distortion coefficients

  • Allows undistortion of images

  • Enables accurate 3D measurements


Viewing Current Calibration


Using Calibration Data

Undistort image:


Part 7: Practical Applications

Application 1: Line Following

Detect line on floor, steer toward it:

Use error for steering:


Application 2: Obstacle Color Classification

Classify obstacles by color, react differently:


Application 3: AprilTag Detection

AprilTags = Fiducial markers (like QR codes for robots)

Install apriltag library:

Launch apriltag detector:

Use cases:

  • Precise localization (know exact position from tag)

  • Object identification (each tag has unique ID)

  • Docking stations (navigate to specific tag)

  • AR applications (overlay virtual objects)


Part 8: Recording and Processing Video

Recording Camera Data


Offline Processing

Process recorded video:

Run:


Part 9: Limitations and Considerations

What Camera Perception Can't Do (Without ML)

โš ๏ธ Important: Basic image processing has limitations

Without machine learning:

โŒ Cannot do:

  • Object recognition (identify specific objects like "person", "chair", "dog")

  • Face recognition

  • Text reading (OCR - though basic, possible with libraries)

  • Complex scene understanding

  • Semantic segmentation

โœ… Can do:

  • Color detection

  • Shape detection (simple geometries)

  • Edge/contour detection

  • Motion detection

  • Line following

  • Marker detection (AprilTags, QR codes)

  • Brightness/contrast analysis


When to Use Machine Learning

For advanced perception, need ML models:

Options:

  • Pre-trained models (YOLO, MobileNet, etc.)

    • Object detection

    • Instance segmentation

    • Pose estimation

  • Custom training (TensorFlow, PyTorch)

    • Train on specific objects

    • Requires dataset collection

    • Computationally intensive

Note: ML perception is beyond this tutorial but worth exploring!


Lighting Considerations

Camera performance varies with lighting:

Good lighting:

  • Indirect, even illumination

  • No harsh shadows

  • Consistent throughout environment

Poor lighting:

  • Direct sunlight (overexposes)

  • Backlit scenes (subject too dark)

  • Rapid changes (drives in/out of shadows)

  • Very low light (noisy images)

Tip: Test perception algorithms in actual target lighting conditions!


Part 10: Knowledge Check

Concept Quiz

  1. Why use HSV instead of RGB for color detection?

  2. What does camera calibration provide?

  3. What's the difference between /image_raw and /image_raw/compressed?

  4. Can basic OpenCV detect "what" an object is (e.g., identify as "cup")?

  5. Why is the camera mounted forward-facing at 8.8cm height?


Hands-On Challenge

Task: Color-based navigation system

Requirements:

  1. Robot follows green markers on floor

  2. Stops at red markers

  3. Ignores other colors

  4. Publishes cmd_vel based on vision

  5. Displays annotated video showing detections

  6. Records demo video

Deliverable:

  • Python script with vision processing

  • Video recording of robot navigating course

  • Documentation of color thresholds used

  • Discussion of challenges encountered

Bonus:

  • Add shape detection (only respond to green circles, not rectangles)

  • Implement smooth steering (PID control based on centroid error)

  • Handle cases where no marker visible


Part 11: What You've Learned

โœ… Congratulations!

You now understand:

Camera Fundamentals:

  • โœ… Camera specifications and mounting

  • โœ… Image topics and message formats

  • โœ… Viewing camera streams (rqt, RViz)

  • โœ… Recording and playing back video

Image Processing:

  • โœ… OpenCV basics (CvBridge, cv2 functions)

  • โœ… Image filtering (blur, edge detection)

  • โœ… Color detection (HSV color space)

  • โœ… Shape detection (contours, polygons)

Practical Applications:

  • โœ… Line following

  • โœ… Obstacle classification by color

  • โœ… Object tracking (centroids)

  • โœ… Marker detection (AprilTags)

Advanced Topics:

  • โœ… Camera calibration concepts

  • โœ… Undistorting images

  • โœ… Offline video processing

  • โœ… Limitations of basic vision


Next Steps

๐ŸŽฏ You're Now Ready For:

Immediate Next: โ†’ Localization Techniques - Combine vision with other sensors

Autonomous Navigation: โ†’ Autonomous Navigation - Vision-aided obstacle avoidance

Advanced Vision:

  • Machine learning object detection (YOLO, MobileNet)

  • Visual odometry (estimate motion from camera)

  • SLAM with visual features (ORB-SLAM)

  • Depth estimation (if adding depth camera)


Quick Reference

Essential Camera Commands


OpenCV Quick Reference


HSV Color Ranges (OpenCV)

Color
Hue Range
Typical Sat
Typical Val

Red

0-10, 170-180

100-255

100-255

Orange

10-25

100-255

100-255

Yellow

25-35

100-255

100-255

Green

35-85

50-255

50-255

Blue

85-125

50-255

50-255

Purple

125-155

50-255

50-255


Completed Camera-Based Perception! ๐ŸŽ‰

โ†’ Continue to Localization Techniques โ†’ Or return to Tutorial Index


Last Updated: January 2026 Tutorial 9 of 11 - Advanced Level Estimated completion time: 150 minutes

Last updated