A warehouse worker uses voice technology to assist with picking tasks
By Your Edge Contributor | November 01, 2023

Do You Really Need a “Voice-Only Picking Solution”? Or Would “Voice Technology” Better Enable Your Picking Process?

And what’s the difference between the two?

This post was contributed by Alejandro Calero, General Manager, Mobilis, a Zebra Independent Software Vendor (ISV) Partner.


Voice technology can help your warehouse operations teams around the globe keep up with the increasing volume of orders because workers can move products while simultaneously maintaining access to the information systems guiding their movements through voice commands. Without this ability, workers stop moving products to enter information into or retrieve information from the Warehouse Management System (WMS), and there are downstream consequences to that stoppage. With the right voice technology, though, you can remove data entry stops altogether from your workflow, allowing your team to use this newly gained time to more efficiently move more products. 

Since adopting voice technology in the warehouse makes so much sense, it is important to understand the two different approaches that exist in the market today because it’s likely that one approach will make a lot more sense for you (if you value simplicity and speed). 

  1. Voice-Only Picking Solutions

A voice-only picking solution enables you to build a new picking process on a very specialized, standalone software platform that’s purpose-built for voice only. This is the approach that vendors of legacy voice-only solutions have taken for years. However, this platform must be fed with the picking orders to be executed during the day, and the resulting transactions must then be fed back into the backend systems for further processing. Let’s look at some of the implications of this approach.

Logic repetition. The voice-only picking solution vendor will take you back to square one on the logic design gameboard. They will try to have your processes adapted to their barcode-free workflows. Some compromises will have to be made, and some flexibility will be lost. For example, think of the number of times during the day that you currently run waves of orders. The person who runs those waves for the WMS orders will now have to run a separate set of waves for the voice-only orders. You’ll end up maintaining two separate overlapping workflows or adding headcount to keep up with the increased planning work. You’ll find yourself hesitant to migrate all your picking processes to voice-only.

Time to implement. A standalone voice-only picking solution is slow to implement. The flow of information back and forth between the voice platform and the backend systems needs to be carefully planned and tested. Implementation times will be no shorter than six to eight months.

Hardware duplication. Most standalone voice-only picking solution providers will claim that their platforms are compatible with modern Android devices. The truth is that they work best only when you use their own proprietary hardware. This means that the cost of implementation is heavily affected by the need to purchase dedicated hardware from a niche-vendor.

Maintenance cost. Niche hardware vendors typically have very limited repair facilities around the world. This translates into high maintenance costs, expensive and/or scarce parts, and slow turn-around times.

Performance. Standalone voice-only picking solutions work great. Response times are practically instantaneous. As soon as the worker is done speaking an instruction, the system is already providing them with the next task. Unfortunately, this fast performance brings up one last aspect to consider and it is not pretty.

Reliability. Voice will not always be the best data-entry vehicle possible. Insisting on always replacing barcode scanning with voice causes errors. Often, these errors will not show up until later in the month. Vendors’ reporting modules will help identify where the problem occurred, but solving the issue will still require manual fixes. That means more money and time spent addressing an issue that could potentially be avoided in the first place if you took a different approach to voice. 

So, let’s talk about the alternative.

2. Voice-Enabling a Picking Process

When we talk about voice-enabling a picking process, that implies that the picking process already exists, that it is being executed by a worker equipped with a barcode-reading device running standard terminal emulation or web browsing software, and that the voice technology will perform data-reading and data-entry functions on this device with no changes whatsoever to the existing picking process. Even better: data reading and data entry with voice are functions that can be applied to any warehouse process, not just picking.

Data-reading with voice is based on Text-To-Speech (TTS) conversion. It means that the text displayed on the device’s screen will be read out loud with a natural sounding digital voice. Whatever piece of relevant information is on the screen can be read out loud, either automatically after each screen transition, or upon request.

Data-entry with voice is based on Speech-To-Text (STT) conversion. It means that the human voice will be recognized and converted to text. Once available, the text can be submitted to the WMS as virtual keyboard strokes, or it can be used to drive more TTS instructions.

It is so simple a concept that it requires a quick process walkthrough to grasp its full power. Consider a WMS is already in place, supporting your many flavors of picking processes. Warehouse workers might spend their morning hours processing e-commerce orders, while the afternoon hours are spent picking full-pallet orders. Workers use the same device throughout their shift. Say you voice-enable the e-commerce picking process by adding STT and TTS conversion capabilities to the terminal emulation software running on the barcode-reading device. So, now, the worker can listen to the location instructions (TTS) while moving from one location to the next. 

Once the worker is at the location, they speak out the last so-many digits of the barcode label that they see there (STT). If these spoken digits match the ending digits of the location displayed on the screen, the voice app enters the full location information in the device’s cursor prompt. The terminal emulation software sends this information over to the WMS as if it had been scanned. 

Then, the WMS responds with the quantity of pieces to pick. The TTS function reads out loud this quantity to the worker off the terminal emulation screen. The worker then begins transferring the product to the picking cart. While doing this, they can ask for product information, such as the SKU code, or product description, significantly reducing the chance of errors, all while continuing to move the products. 

When they’re ready, they will confirm the quantity of pieces by voice (STT), submitting the corresponding numeric characters to the WMS host. 

The WMS system does not distinguish between locations or quantities that were confirmed by scanning or by keyboard or by voice. The sequence of steps was never altered for supporting voice. The ability to scan data was never removed. Voice was simply an alternative mode of data-entry and data-reading. This approach carries a load of advantages.

Because there is no third-party platform involved, there is no repetition of workflow logic, no database integration required, and no long implementation times. For these same reasons, voice-enablement costs are significantly lower than those of standalone solutions, typically in ratios of three to one. The level of rejection among workers is very low or non-existent because the workflow is the same with which they are already familiar, the only difference being that they navigate through the steps without having to hold the device in their hands. 

Given that this approach is based on adding STT/TTS conversion capabilities to the device’s connectivity software, you are meant to reuse your existing Android devices. If for ergonomic reasons you choose to purchase new devices, these can very well be used in all other scanner-based processes, when not used by voice. 

Maintenance is not a burden, particularly if you rely on packaged services with global support, such as Zebra’s OneCare offerings. The only real requirement is that the WMS connectivity software on your device be compatible with the voice app. Zebra’s Independent Software Vendor (ISV) program has validated both its own Enterprise Browser and a Smart Terminal Emulation application, to run voice apps. 

As expected, all these advantages were not going to come about without a tradeoff. The catch is in the performance. A voice-enabling solution will only perform as fast as the hosting WMS itself. If the response time between screens in your WMS system is fast, your voice-enabled process will run fast.  

The Takeaway

The next time someone tells you that you need a voice-picking solution, ask them to clarify their recommendation. Are they suggesting a voice-only picking solution (as in a standalone platform that takes barcode scans out of the equation)? Or do they want you to integrate voice technology into your existing, barcode-based picking processes to enable more efficient task completion? If it’s not the latter, then give me or another trusted warehouse industry veteran in your local area a call and ask for a second opinion. Leaning into voice technology is the right move for several reasons…but you have to lean into the right type of voice technology if you want it to benefit your business. 

Editor’s Note:

To learn more about Zebra’s ISV partners who support voice-enabling applications, go to Zebra’s Partner Locator here.

About the Author:

Alejandro Calero is an industry veteran who has provided terminal emulation software-related services and professional advice to corporate customers, hardware manufacturers, WMS vendors, and the partner community throughout Latin America for the past 20 years. He runs Mobilis, a consulting company out of Mexico City, specializing in advanced applications, such as voice-enablement and lean warehousing applied to WMS mobile workflows. Mobilis is a member of the Zebra ISV program.

Related Reads:

Wearables, Healthcare, Warehouse and Distribution, Retail, Article, Hospitality, New Ways of Working, Energy and Utilities, Partner Insight, Manufacturing, Digitizing Workflows, Transportation and Logistics, Public Sector,

Zebra Developer Blog
Zebra Developer Blog

Are you a Zebra Developer? Find more technical discussions on our Developer Portal blog.

Zebra Story Hub
Zebra Story Hub

Looking for more expert insights? Visit the Zebra Story Hub for more interviews, news, and industry trend analysis.

Search the Blog
Search the Blog

Use the below link to search all of our blog posts.