A handler walks into a grocery store with their service dog. The dog drifts two feet wide on the left turn, breaks a sit-stay at the deli counter, and re-engages with a verbal cue before any staff notices. Did that sequence meet public access standards? A trainer watching from across town cannot answer that question. A cloud model with a 400ms round-trip cannot answer it fast enough to be useful. An edge model running at 28 frames per second on the handler's phone absolutely can.
Edge inference for canine assessment is the practice of running computer vision models directly on mobile and embedded hardware, with no server round-trip, to generate handler feedback in real time. At ServiceDog.AI, our clinical and engineering teams have spent two years benchmarking this pipeline. What follows is a technically grounded breakdown of model choices, runtime performance, and what the results mean for real-world handler coaching programs built on frameworks like CoreML, TensorFlow Lite and ONNX Runtime.
Why Edge Inference Changes Handler Coaching
Cloud-based inference has served computer vision research well. For handler coaching, it fails on several critical dimensions.
Latency is the first problem. Coaching feedback must arrive within one stride cycle of a behavior event to be neurologically associated with that event by the handler. At typical mobile network latencies, a cloud round-trip introduces 150 to 600 milliseconds of delay depending on signal quality. Inside steel-framed retail buildings, the delay stretches further. Edge inference on a modern neural engine delivers responses in 20 to 50 milliseconds from frame capture to feedback signal.
Privacy is the second problem. Public access evaluation captures video of the handler, the dog and every bystander in frame. Uploading that footage to a server creates HIPAA-adjacent risk for handlers whose disability status may be visible, and creates general privacy exposure for uninvolved members of the public. On-device processing means video never leaves the phone.
The third problem is connectivity. Real public access environments include elevators, parking structures, hospital basements and transit corridors. A coaching system that drops out when the handler needs it most is worse than no system at all. Edge inference runs offline by design.
Model Selection for Canine Pose and Behavior Tasks
Choosing a model architecture for edge deployment requires balancing three competing constraints: accuracy on canine body keypoints, inference speed on mobile neural processing units and model size relative to app storage budgets.
Pose Estimation Architectures
Human pose estimation models like MoveNet and BlazePose are well-optimized for mobile hardware but are trained on human skeletons. Canine anatomy differs substantially. The dog spine is horizontal, the number of trackable keypoints differs, and the size variance between a 6-pound Chihuahua service dog and a 110-pound German Shepherd creates scale challenges that human-pose models were never designed to handle.
The research community has addressed this with canine-specific architectures. Work published at CVPR and ICCV in recent years has explored animal pose estimation networks built on HigherHRNet and HRNet backbones. The Animal Pose Dataset and AP-10K benchmark have become standard evaluation sets. For mobile deployment, the practical path is to take a lightweight HRNet-W18 or a MobileNetV3-based feature extractor, fine-tune on canine keypoints and export to the target runtime.
Key canine keypoints for public access assessment include nose, ears, withers, base of tail, four paw contact points and hip position. That gives a 13-point skeleton. At inference time, those keypoints feed a second-stage classifier that labels posture states: heel position, sit, down, stand, break, forward pull and gaze direction toward handler.
Behavior Classification on Top of Pose
Pose estimation gives you where the dog is. Behavior classification tells you what the dog is doing. The two-stage pipeline keeps each model small. A temporal convolutional network or a lightweight LSTM operating on the keypoint sequence over a 1.5-second sliding window classifies behavior with significantly lower parameter counts than a single end-to-end video model. This matters for edge deployment where RAM is shared with the camera buffer and the UI layer.
CoreML, TensorFlow Lite, and ONNX Runtime: A Practical Comparison
All three major mobile inference runtimes can run quantized pose and classification models. They are not equivalent in how they access hardware acceleration or how they behave under thermal throttling on prolonged video workloads.
CoreML on Apple Neural Engine
CoreML is the native Apple runtime and the only one with direct access to the Apple Neural Engine on A15 and later chips. When a model compiles cleanly to ANE-compatible operations, the performance gap over competing runtimes on iOS is substantial. Models that include unsupported layer types fall back to GPU or CPU, which erases that advantage.
For canine pose estimation, standard depthwise convolutions, batch normalization and bilinear upsampling all compile to ANE without fallback. Heatmap decoding with argmax operations requires careful implementation to avoid CPU fallback. Our team exports ONNX models from PyTorch training runs and converts with coremltools, using the ML Program format target for best ANE utilization on devices running iOS 16 and later.
TensorFlow Lite with NNAPI and Metal Delegates
TFLite remains the dominant runtime for Android deployment. The NNAPI delegate routes computation to the device vendor's neural processing hardware when available. On Qualcomm Snapdragon 8-series chips the Hexagon DSP provides the acceleration path. On Google Tensor chips the Edge TPU handles quantized int8 models efficiently.
TFLite's quantization-aware training pipeline and post-training int8 quantization are mature. A canine pose model quantized to int8 with representative dataset calibration on typical outdoor and indoor handler environments retains keypoint accuracy within 2 to 3 percentage points of the float32 baseline on PCK@0.2 metrics, while reducing model size by approximately 75 percent and improving latency proportionally.
ONNX Runtime Mobile
ONNX Runtime Mobile is the most portable choice for teams maintaining a single model across iOS and Android. The runtime supports CoreML Execution Provider on iOS and NNAPI Execution Provider on Android, giving access to hardware acceleration through a unified API surface. The tradeoff is that it adds an abstraction layer. On iOS, ONNX Runtime typically performs 10 to 25 percent slower than a natively compiled CoreML model for the same graph, because the CoreML EP does not expose all tuning parameters that coremltools direct conversion can access.
For teams prioritizing engineering velocity over absolute latency, ONNX Runtime Mobile is the correct choice. For teams building a production coaching system where 10ms matters, native compilation per platform is worth the additional effort.
Benchmark Results on iPhone and Android Hardware
The following results reflect internal benchmarking conducted by the ServiceDog.AI engineering team on our canine pose estimation pipeline. The pose model is a quantized MobileNetV3-backbone HRNet variant with 13 keypoint outputs. The behavior classifier is a 3-layer temporal convolutional network operating on a 45-frame keypoint buffer at 30fps input. All benchmarks run on 720p input with the camera feed active and UI rendering live.
iOS Device Results
On iPhone 15 Pro with the A17 Pro chip, CoreML with ANE routing achieves 22ms mean inference latency for the pose model and 4ms for the TCN classifier, giving a combined 26ms pipeline latency. Thermal throttling under sustained 10-minute recording sessions raised mean latency to 31ms, still within the one-stride-cycle feedback window.
On iPhone 13 with the A15 Bionic chip, CoreML latency is 29ms for pose and 5ms for classifier, 34ms combined. Sustained session throttling brings that to 41ms, which begins to approach the boundary of effective real-time feedback. Older A14 devices showed consistent thermal throttling above 50ms in sustained sessions, which our team considers the practical lower bound for useful coaching feedback.
Android Device Results
On a Pixel 8 Pro with the Tensor G3 chip, TFLite with NNAPI and int8 quantization achieves 31ms pose and 6ms classifier latency, 37ms combined. The Tensor chip's image processing pipeline also allows early termination of frames where the dog keypoints fall outside a confidence threshold, reducing wasted computation on occluded frames by approximately 18 percent in indoor environments.
On a Samsung Galaxy S24 with Snapdragon 8 Gen 3, TFLite with NNAPI via Hexagon DSP achieves 24ms pose and 5ms classifier, 29ms combined. This is the fastest Android result in our current test set, reflecting Qualcomm's mature DSP inference stack.
Mid-range Android devices using MediaTek Dimensity 900-series chips showed higher variance, with mean latency of 48ms and occasional 80ms spikes during Garbage Collection events in the JVM layer. These devices require frame-drop compensation logic in the keypoint buffer to prevent stale frames from contaminating the TCN input sequence.
Building the Real-Time Feedback Pipeline
Raw inference latency is only part of the engineering problem. The feedback pipeline includes camera capture, preprocessing, inference, postprocessing, state machine evaluation and UI output. Each stage adds latency and each handoff between stages can introduce jitter that degrades the coaching experience.
Camera Capture and Preprocessing
On iOS, AVFoundation's AVCaptureVideoDataOutput delivers frames as CMSampleBuffers. Converting directly to CVPixelBuffer and passing to CoreML without intermediate copies is critical. Any UIImage or CGImage conversion in the hot path adds 8 to 15ms per frame. On Android, CameraX with YUV_420_888 output and a native preprocessing shader that crops and normalizes in GPU memory before TFLite inference avoids similar CPU-side conversion costs.
Frame resolution selection matters. Our pipeline runs inference on 256x192 crops centered on the dog bounding box, not on the full 720p or 1080p camera frame. A lightweight person-and-dog detector runs at 5fps to update the crop region. Pose inference runs at 30fps on the crop. This two-tier architecture reduces the pixel count processed per inference by over 90 percent compared to full-frame pose estimation.
Feedback State Machine
Raw model output is noisy. A single frame where the dog's paw keypoint drops confidence below threshold should not trigger a break-of-heel alert. The feedback state machine applies hysteresis: a behavior state must persist for 8 consecutive frames at 30fps before a coaching event fires. That 267ms window eliminates flicker-alerts from occluded frames, head turns and brief postural adjustments that are not true behavior breaks.
Coaching events are tiered by severity. A gentle haptic pulse fires for position drift beyond 6 inches from heel. An audio cue fires for a full break of position. A summary annotation is written to the session log for trainer review via the TheraPetic® Training Plus program available through officialservicedog.com, where licensed trainers can review annotated session data alongside Public Access Test evaluation criteria.
Deployment Considerations for ADA-Context Applications
Service dog handler coaching applications operate in a legally and ethically distinctive context. Under the Americans with Disabilities Act as enforced in 2026, businesses may ask only two questions of a service dog team: whether the dog is required due to a disability and what task the dog is trained to perform. Coaching applications that surface task-performance data must be designed so that data cannot be used to create a third-party verification burden on the handler.
At ServiceDog.AI, our design principle is that session data belongs to the handler. The edge model produces coaching signals that stay on the device. Aggregated performance metrics that flow to trainer review via the Training Plus program are opt-in and handler-controlled. No raw video leaves the device. This architecture respects both handler privacy and the ADA framework articulated by the Department of Justice under Title III.
Model updates present a separate deployment consideration. On-device models cannot be updated silently. CoreML model packages and TFLite flatbuffers can be delivered as downloadable assets outside the app binary, allowing model improvements to reach deployed devices without an App Store review cycle. This is important for a domain where canine behavior taxonomy and PAT evaluation criteria may evolve through the work of organizations like the International Association of Assistance Dog Partners and Assistance Dogs International.
Where Edge AI for Service Dog Assessment Is Heading
The hardware trajectory is favorable. Apple's Neural Engine performance has grown dramatically across chip generations. Qualcomm is embedding dedicated AI accelerators deeper into Snapdragon platform designs. MediaTek's NeuroPilot framework is maturing on Dimensity chips. By late 2026, the mid-range device performance floor for quantized pose inference will likely be where flagship performance sits today.
The more interesting frontier is multimodal edge inference. Canine behavior assessment at the level required for rigorous public access evaluation needs more than pose. It needs gaze direction and attentiveness toward the handler, acoustic cue-response latency (did the dog respond to the verbal cue within the task window), harness load analysis via IMU data and environment classification so the model understands that an escalator is a higher-stakes context than a parking lot.
Each of those modalities can be addressed with additional lightweight models fused at the state machine layer without dramatically increasing total inference cost. An acoustic model for cue detection can run at 5fps on CPU while the GPU handles pose. IMU processing is trivial in compute terms. Environment classification from a MobileNet-class model adds 8ms at 1fps. The fusion is architectural, not just a matter of raw FLOPS.
Biometric handler-dog team authentication, which TheraPetic®.AI addresses at the clinical AI layer, is also moving toward edge execution. Handler gait signature models and canine gait signature models are small enough to run on-device when quantized to int8. Real-time authentication that verifies a registered team without a network call is a near-term capability.
For trainers working in programs aligned with the TheraPetic® Training Plus curriculum, the near-term practical benefit is session logging that goes far beyond timestamped notes. Every public access training session becomes a structured dataset annotated with behavior events, position deviations and context labels. That data improves both the individual team and the population-level model training sets that make the next generation of assessments more accurate.
Edge inference is not a future capability for canine assessment. It is the present capability that is ready to be deployed, benchmarked and integrated into serious handler coaching programs today.
