The Robot Times Blog

A curated selection of content and insights from the Brain Corp team

5 Lessons Learned from Scaling the World’s Largest AMR Fleet

close
Brain Corp creates groundbreaking AI software technology that our manufacturing partners use to build and sell autonomous robots to retailers, malls, airports, hospitals and more. But our work doesn’t end there.

Joe Wieciek

Joe Wieciek, Manager of Software Operations  |  6 August 2020

5-Lessons-Learned-from-Scaling-the-Worlds-Largest-AMR-Fleet-Graphic

Making Sure Robots Perform Well in the Field

Brain Corp creates groundbreaking AI software technology that our manufacturing partners use to build and sell autonomous robots to retailers, malls, airports, hospitals and more. But our work doesn’t end there.

Once the robots are deployed, our software operations team works diligently to ensure that every BrainOS®-enabled robot performs well in the field, collecting data and insights via the cloud that we use to improve our software and systems, and ultimately create better user experiences. However, managing a handful of robots in the field is drastically different from managing a large global fleet.

We learned that the hard way on our path to powering the largest fleet in the world of autonomous mobile robots (AMRs) operating in commercial indoor public spaces. These are the five most important lessons we’ve learned from scaling our BrainOS-powered fleet from 10 to more than 10,000 over the last three years.

1: Build infrastructure early.

We learned early on that we need to have the ability to access the robots remotely. When you’re working with just a few robots, it’s easy enough to be hands-on with updates and fixes. But as your fleet passes 50 or so, it quickly becomes impossible to manually keep track of everything that’s happening with each robot. Collecting information and understanding precisely how, when, and where each robot is operating is crucial to providing good service and maintaining a good product. So what’s the solution? Robust infrastructure.

Though we can’t be on the ground with every user, we can keep a close virtual eye on the state of the robots and quickly resolve any issues they are experiencing. With global infrastructure, including proprietary robot performance telemetry, we can monitor every robot in near real-time and can deploy configuration changes or software updates in just a few hours. Proper infrastructure is what makes managing a high-performance fleet possible.

2: Visibility is everything.

In order to detect, investigate, and resolve issues with the robots in the field, as well as determine areas of improvement, we need full visibility at both individual robot and fleet levels. Without reliable performance monitoring tools, we wouldn’t be able to immediately understand how and why robots fail, meaning users would be forced to wait for someone to diagnose the robot in person.

Infrastructure built for visibility also allows us to run analytics at scale and gather data from thousands of robots around the world in near real time. The insights we gain from that data help us continually improve our software and ensure that robot performance gets better with every release.

3: Traceable configuration management can save the day.

The need for visibility extends to our internal processes. As our fleet grows, we need to be able to test how different features or configurations perform without inadvertently causing problems for our users. Manually connecting to individual robots to make updates or edits is not only inefficient, it’s also not transparent. It’s crucial that our infrastructure enables visibility around what, when, where, and by whom configuration changes or software updates are made so that we can track their impact on robot performance.

By incorporating traceability into our infrastructure, we can quickly and easily audit any issues that arise with our users’ robots, trace the issues back to their source, revert them, and prevent them from happening again. This ensures end users of a more consistent experience, as well as faster time to new features, quick rollback of potential issues, and the ability to adjust robots based on environmental issues for better performance.

4: Small frequent software updates are better than big occasional ones.

When we were first starting out, we did feature-based releases, meaning we only updated the robots via the cloud when a new feature was ready to be released. This approach was not only frustrating for our developers who had to wait months before seeing results, but also detrimental to our users who had to wait months for new features, improvements, or bug fixes. Each new release entailed significant changes and, despite rigorous pre-release testing, there was always a chance that those changes could have bugs or unexpected effects on other parts of the system.

Rethinking our release cycle allowed us to minimize that risk and also make better iterative improvements. Just as cloud software providers push out constant tweaks, we began releasing minor software updates on a regular basis for the robots to automatically download via our distributed infrastructure. Our users now expect regular software updates that have very little chance of negatively impacting the performance of the robots. And if there is a bug or if a feature doesn’t work well, our infrastructure will pick it up and we deploy a fix in a matter of hours or days, often without the user ever noticing a performance issue. This means that the robots are incrementally improving without users needing to do any work.

5: Robots need to be understandable to be useful.

End users also need visibility into how the robots they rely on are performing. But robots are complex and inaccessible. How can we expect users to trust the robots when they don’t understand how they work?

The only way for robots to actually be useful at scale is if we translate what they do into accessible language. This doesn’t just mean that they should have an intuitive user interface—though, of course, that’s a must—they also need clear and easy to understand documentation, support tools, and manufacturing guidelines.

The robots’ accessibility impacts their serviceability and reliability for users, so we are constantly working to make the robots more accessible. For example, instead of displaying a coded error message, the robots display a pop-up that states the problem and what steps the user can take to resolve it. Because our robots are easy to understand, they are easier and faster to repair, more trustworthy, and more useful overall.

These five lessons are just the tip of the iceberg. Over the past decade, it’s safe to say that we’ve learned countless lessons that have allowed us to build better robots that better serve our users. The culmination of everything we’ve learned so far is exemplified by our “people-first” approach to safety and robotics. Our robots support people and improve productivity by taking over labor-intensive, joyless tasks so that human workers can focus on other things. We’re proud of our progress, and we’ll continue to strive to make our robots even better and easier to use.