Architecture Overview

This document describes the architecture and design of the Cat Viewer system.

System Components

The system consists of four main modules:

Camera Module (camera.py)

The Camera class wraps the picamera2 library to provide a high-level interface for Raspberry Pi camera modules. Key responsibilities:

  • Camera initialization and configuration

  • Automatic full sensor resolution detection

  • Frame capture at configured FPS

  • Frame format conversion (RGB/RGBA to BGR for OpenCV)

  • Camera validation and error handling

  • Support for IR camera controls (exposure, gain)

The camera automatically detects and uses the full sensor resolution (e.g., 3280x2464 for Camera Module 2) instead of cropping to a smaller resolution, ensuring the complete field of view is captured.

Detection Module (detector.py)

The CatDetector class wraps the Ultralytics YOLOv8 model for cat detection:

  • Loads pre-trained YOLOv8n (nano) model

  • Runs inference on video frames

  • Filters detections to only cats (COCO class ID 15)

  • Returns Detection objects with bounding boxes and confidence scores

The detector is optimized for CPU inference on Raspberry Pi using the lightweight nano model variant.

Recording Module (recorder.py)

The VideoRecorder class manages video recording sessions:

  • Creates timestamped output files

  • Manages recording state (start/stop)

  • Controls recording duration

  • Handles frame writing with automatic resizing

  • Uses OpenCV VideoWriter with mp4v codec

Configuration Module (config.py)

The config module handles YAML configuration loading:

  • Loads settings from config.yaml

  • Provides getter functions for all settings

  • Falls back to hardcoded defaults if YAML is missing

  • Supports path expansion (home directory, absolute paths)

Main Application (main.py)

The main module orchestrates the system:

  • Parses command-line arguments

  • Initializes camera, detector, and recorder

  • Runs the main monitoring loop

  • Handles cat detection and recording triggers

  • Manages preview window (optional)

  • Provides test recording mode

Data Flow

  1. Initialization: - Load configuration from YAML (with CLI overrides) - Initialize camera, detector, and recorder - Validate camera can read frames

  2. Monitoring Loop: - Read frame from camera - Run detection on frame - If cat detected and not recording: start recording - If recording: write frame to video file - If recording duration elapsed: stop recording - If preview enabled: draw detections and display

  3. Frame Processing: - Camera captures frame in RGB/RGBA format - Frame converted to BGR for OpenCV compatibility - Frame passed to detector (YOLOv8 expects RGB, handles conversion) - Detections drawn on frame for preview - Frame written to video file if recording

Configuration

Configuration is loaded from config.yaml with the following structure:

  • detection: Threshold and model settings

  • camera: Device, resolution, FPS, optional IR controls

  • recording: Duration and output directory

  • model: Model file name and cat class ID

Command-line arguments override YAML values, allowing runtime customization without editing config files.

Error Handling

The system includes comprehensive error handling:

  • Camera initialization failures with detailed troubleshooting messages

  • Camera validation to ensure frames can be read

  • Frame read failures with throttled logging and recovery

  • Preview window failures (graceful degradation, continues recording)

  • Detection errors (logged, system continues)

  • Recording errors (logged, system continues)

The system is designed to be robust and continue operating even when some components encounter errors (e.g., preview window unavailable over SSH).

Performance Considerations

  • Model Size: Uses YOLOv8n (nano) for CPU-friendly inference

  • Input Resolution: Configurable model input size (default 640x640)

  • Frame Rate: Configurable FPS (default 10) balances detection quality and CPU load

  • Full Sensor Resolution: Automatically detected and used for maximum field of view

  • Recording Codec: Uses mp4v codec for Raspberry Pi compatibility

The system is optimized for Raspberry Pi 4 with CPU-only inference. GPU acceleration is not currently supported but could be added for improved performance.