Motivation

A pain point? Not under my watch

Three months back, I came across a YouTube video where a boss disguised himself as a courier. Within just five minutes of starting the onboarding process, he was already struggling with a system that seemed to ignore the needs of its users.

Enter the pain source: pick by voice system

Retail businesses frequently employ a pick by voice system. These systems act as a link between the worker and a central operational hub. Interactions with this system are typically categorized into two segments: system-to-user and user-to-system.

Pick by voice system: Honeywell Voice SRX2

System-to-user

The system efficiently dictates tasks, specifying which items to collect, their locations, and the quantities needed. This streamlined communication ensures tasks are completed with minimal movement and maximum accuracy—truly a model of efficiency.

This system commands orders to the operators. The backend holds all the necessary information to fine-tune the tasks for optimal efficiency. Specifically for La Sirena, it directs workers on the upcoming order, detailing which items to retrieve from the fridge, their precise locations, and the quantity to be picked up. I appreciate this method of communication; it leaves no room for ambiguity, as everything is meticulously planned to ensure tasks are completed with minimal movement.

User-to-sytem

The problem primarily arises from the user’s side of the communication channel. The automatic speech recognition (ASR) system in use is notably outdated. It requires extensive voice calibration—about 20 minutes’ worth of voice data—and it’s not robust against variations in voice, such as when a user is sick, which can impair recognition accuracy. Additionally, it can process only a limited set of commands. For example, at La Sirena, the procedure requires workers to verify items by announcing the last two digits of an item’s serial number in sequence, and concluding interactions by saying “Listo” (Spanish for ready). The interaction feels mechanical and unintuitive, akin to conversing with a rigid, unresponsive tool rather than a helpful colleague. It underscores the critical need for a better solution.

The solution

La Sirena POC demo

Imagine a system that can visually identify items inside a box, eliminating the need for the user to system verbal communication entirely. This solution, combined with the centralized decision-making of the pick by voice system communication channel, would greatly enhance the efficiency of the order fulfillment process. Such improvements would not only save time but also significantly reduce the mental strain workers experience from dealing with cumbersome communication systems.

I am exicted to announce a series of blogposts that will explore in detail the project I have been working on. Below you can find a tentative list of topics that I will be covering. If these catch your interest, be sure to stay tuned!

1. Application structure overview

2. Threads and the Qt framework

3. YOLO, ONNX, and optimizations

4. Object tracker

5. Redis and business logic