Deep Dive

The Details

How I built a real-time AI detector with 99.9% precision in challenging outdoor conditions

Computer Vision

Deep Learning

CNN

Transfer Learning

PyTorch

OpenCV

Real-Time Detection

Problem & Motivation

Every evening, our backyard camera captured dozens of motion events - mostly wind, insects, or shadows. But hidden in the noise were possums. What if we could teach a machine to see, filter noise, and tell us only when it matters? This project explores my journey from raw IR footage to a robust, real-time possum detector.

Those late-night visits were not just interesting observations - they posed a real risk. Possums regularly enter the backyard, inevitably attracting the attention - and hunting instincts - of our dog, Beau. To prevent potentially dangerous encounters, the long-term vision was a smart dog-door mechanism that automatically closes whenever a possum is detected, keeping wildlife safe and Beau indoors. Before integrating anything into the door, however, we started with a simpler, controlled prototype: a small box with a carrot inside. This setup allowed us to reliably attract possums to a fixed location, making detection easier to test, measure, and iterate.

At the same time, the project evolved beyond detection. The data began to reveal patterns - recurring visit times, frequency trends, behavioral consistency. Understanding these patterns became just as compelling as solving the detection problem itself.

Technically, the challenge is significant. Night-time infrared footage is inherently noisy: insects, wind-driven vegetation, small animals, and sensor artifacts can all resemble meaningful motion. The task was clear - build a system that reliably distinguishes possums from environmental noise in real time, while minimizing false alarms.

Key Challenges

Building a real-time wildlife detection system sounds straightforward — until you observe how wildlife actually behaves.

A Noisy Night World

Infrared footage at night is chaotic. Insects pass close to the lens, wind moves vegetation, rain reflects light, and small animals trigger motion events. Many of these look convincingly similar to a possum at ROI level.

Unpredictable Behavior

Possums don't behave like clean training data. They freeze for minutes, hide in tall grass, disappear behind fences, and suddenly reappear. A single visit can easily fragment into multiple detections — or be missed entirely if motion is too subtle.

Temporal Logic Over Frames

Consecutive frames are nearly identical. Without careful aggregation logic, the system either floods alerts or splits one event into many. Defining what constitutes a "visit" required more than just classification — it required temporal reasoning.

Imbalanced & Imperfect Data

True possum appearances are rare. Most motion events are noise. Crops can be blurred, partially occluded, or poorly illuminated. Constructing a representative dataset — especially high-quality negative samples — is both critical and labor-intensive.

Real-Time & Edge Constraints

The system runs continuously. Thousands of ROIs must be filtered efficiently without overwhelming the CNN. At the same time, video must be recorded smoothly, events consolidated correctly, and the pipeline optimized for potential edge deployment.

In short, the hardest part wasn't detecting a possum — it was defining what detection should mean in the real world.

Data Collection & Preparation

~30K

total images

20K

train

5K

validation

5K

test

Collection Period

Aug 2025 - Feb 2026

Source

Backyard night camera

Preprocessing

Motion detection ROI extraction, manually reviewed & labeled

Session-Based Splitting

ROIs from the same night session were kept together in either train or test sets to prevent temporal data leakage.

Blurry Image Inclusion

Motion-blurred possum images were kept in training to reflect realistic night conditions.

Understanding the Data: What the Model Actually Sees

Before training the classifier, it was important to understand what the system was truly learning from.

Each motion trigger generates a cropped Region of Interest (ROI). Some are clean and obvious. Others are ambiguous, blurred, or deceptively similar to a possum. Below are representative examples.

Good Possum Images

Blurry / Motion-Affected Possum Images

Non-Possum Motion (Mice, Insects)

Training data is not just about quantity — it is about representativeness.

The model must learn to:

Recognize a possum in ideal conditions
Tolerate blur and occlusion

Reject visually similar environmental noise

In practice, the quality and diversity of ROIs had a greater impact on system reliability than minor architectural changes to the CNN.

Model Architecture

The goal was not just high classification accuracy, but reliable real-time detection in a noisy outdoor environment.

The system combines lightweight motion detection with a fine-tuned CNN. Motion detection extracts candidate ROIs, dramatically reducing computational load, while the CNN verifies each crop for final classification.

At its core is a pretrained ResNet18, fine-tuned to adapt from standard ImageNet features to infrared night imagery — a domain with distinct texture patterns, contrast behavior, and noise characteristics.

To ensure stability in real time, detection is confirmed only if a possum appears in 3 out of 5 consecutive frames, reducing flicker and isolated false positives.

Model Training & Performance

The model was trained for 8 epochs using transfer learning with ResNet18, achieving a test accuracy of 99.30% with ROC AUC of 99.92%, demonstrating strong generalization with minimal overfitting.

Training Progress

Training & Validation Loss

Training & Validation Accuracy

Confusion Matrix

Predicted Not Possum

Predicted Possum

Actual Not Possum

2,907

54.9%

1

0.0%

Actual Possum

36

0.7%

2,351

44.4%

Metrics Summary

Accuracy

99.30%

Precision

99.96%

Recall

98.49%

F1 Score

99.22%

Key Insights

Accuracy

99.30%

Precision

99.96%

Recall

98.49%

ROC AUC

99.92%

In a real-time wildlife monitoring system, reliability is measured not only by accuracy — but by user trust.

Detection Logic

Temporal Consistency Rule

A possum is considered detected only if it appears in at least 3 out of the last 5 processed frames. This sliding window approach:

Reduces single-frame misclassifications

Stabilizes predictions in noisy night conditions

Ensures robust detection when possums move slowly or remain stationary

Real-Time Detection Snapshots

Real-time examples captured during live camera inference. The system reports a possum detection only after satisfying the temporal consistency rule.

Terminal Log

Full Frame

ROI to CNN

Terminal Log

Full Frame

ROI to CNN

Video Recording Demo

A sample recording from the backyard night camera, showing the type of footage the detection system processes in real time.

Limitations & Trade-Offs

No real-world system is perfect — especially in uncontrolled outdoor environments. The main limitations of the current approach reflect environmental and deployment constraints rather than architectural flaws.

Environmental Sensitivity

Performance may degrade under extreme weather conditions such as heavy rain, strong wind, or excessive infrared noise. Severe visual distortion affects both motion detection and ROI quality.

Motion Dependency

The pipeline relies on motion as a first-stage trigger. Completely static possums may be temporarily missed until movement resumes.

Limited Positive Diversity

Possum appearances are relatively rare, limiting the variability of training examples. Unusual poses, lighting conditions, or rare behavioral patterns may be underrepresented.

ROI Quality Constraints

Low-resolution cameras, distant subjects, or suboptimal angles reduce crop clarity and can impact classification confidence.

Environment-Specific Generalization

The system is trained on data collected in this specific backyard environment. Changes in background structure, vegetation, camera placement, or lighting conditions may require additional data collection and retraining to maintain performance in a new setting.

Open-World Uncertainty

As with most supervised systems, previously unseen objects or rare edge cases may occasionally be misclassified.

In short, the model generalizes well within its domain, but responsible deployment in new environments requires domain adaptation and additional labeled data.

Cloud Architecture & Backend

The detection system is deployed as a cloud-based, event-driven pipeline on Google Cloud Platform.

Media Storage

All videos and ROI images are stored in Google Cloud Storage. Secure signed URLs are generated dynamically for temporary browser access.

Metadata Storage

Structured visit metadata (timestamps, duration, representative ROI, approval status) is stored in Google Cloud SQL (MySQL). Media and metadata are intentionally separated for scalability and clean architecture.

REST API (FastAPI)

Endpoints

/visits — visit statistics by date range
/videos_rois — per-night visits with signed media URLs
/recent_activity — latest detections
/statistics/dashboard — aggregated behavioral analytics

Event-Driven Video Processing

When a new video is uploaded, Cloud Storage triggers a Cloud Run service. The video is converted to browser-compatible format (H.264, faststart). Metadata prevents reprocessing. This guarantees smooth playback directly in the web interface.

Night Visit Consolidation

A scheduled Cloud Run job merges fragmented detections into single visits using database locks to prevent concurrency conflicts.

Geospatial Wildlife Observations

While the backyard detector reveals when possums visit our garden, it raises a broader ecological question — how common are possum sightings across Australia?

To explore this, I integrated large-scale biodiversity observation data from the Atlas of Living Australia. Approximately 200,000 occurrence records (2020–present) for four possum species were imported and stored in a MySQL database.

Observations are aggregated using hexagonal spatial grids to visualize sighting density across Australia. The data pipeline automatically retrieves new records and stores them in a structured spatial database for fast querying.

Two complementary perspectives

Micro Scale

Backyard Behavioural Monitoring

Precise monitoring in a single backyard — visit timing, duration, frequency, and individual possum identification via CNN detection.

Macro Scale

National Observation Patterns

National patterns of possum observations across Australia — species distribution, regional density, and temporal trends from 120M+ ALA records.

Future Work

This project began as a backyard experiment — but it has clear potential to evolve into a fully autonomous wildlife monitoring system.

Smart Home Integration

Extend detection beyond analytics by connecting it to physical devices — such as automated feeding mechanisms or intelligent dog-door locks — creating a closed-loop safety system.

Model Exploration

Experiment with training a CNN from scratch to compare against the fine-tuned ResNet18 baseline and better understand domain-specific feature learning in infrared imagery.

Enhanced Analytics Dashboard

Develop a richer web dashboard to visualize long-term behavior patterns, seasonal trends, visit frequency, movement heatmaps, and activity timing.

Edge Deployment (Raspberry Pi)

Optimize the pipeline for lightweight edge devices, enabling fully local, low-latency detection without reliance on cloud infrastructure.