Pi-talk: Edge-Only, Adapter-Tuned Multimodal Small Language Model for Safe, Real-Time In-Vehicle Dialogue
Natural-language interaction between passengers and autonomous vehicles is essential for trust, safety, and user experience, but deploying Large Language Models (LLMs) on automotive edge platforms is constrained by compute, memory, energy, and privacy. We present Pi-talk, an edge-only system that enables real-time passenger-vehicle dialogue using a Small Language Model (SLM) running entirely on embedded hardware. Pi-talk performs multimodal fusion of onboard camera, ultrasonic distance, and navigation context via a lightweight encoder-adapter module that aligns modalities into compact semantic tokens for a pre-trained SLM. The SLM produces context-aware explanations of driving decisions, route options, and situational updates without cloud connectivity. Safety is enforced through a real-time safety envelope that gates responses and actions using distance thresholds and timing constraints. We further adapter-tune the SLM (on-device or offline) and deploy it with INT8 quantization and an Open Neural Network Exchange (ONNX) runtime to achieve efficient batch = 1 inference on Raspberry-Pi-class hardware. We evaluate task quality (evaluation loss), end-to-end latency, CPU utilization, and memory footprint, and include ablations contrasting unimodal vs. fused inputs. Results show that Pi-talk sustains few-second, edge-only inference while meeting stringent resource and latency limits and maintaining the safety envelope required for autonomous operation. To our knowledge, Pi-talk is among the first edge-only, multimodal passenger-vehicle dialogue systems that both fine-tune and run a small language model entirely on Raspberry Pi-class, CPU-only hardware with an explicit while enforcing a runtime safety envelope.