Common Sense: Coordinating Robot Teams in shared workspace like a RoboBallet

Thursday, December 04, 2025

Coordinating Robot Teams in shared workspace like a RoboBallet

Good news!

"In factories, where teams of robotic arms work in tight spaces, their motions are programmed by hand to keep them from interfering with one another. Researchers automated this programming using graph neural networks trained via reinforcement learning. ...

A graph neural network can overcome this limitation by learning to produce synchronized, collision-free motions in large numbers of simulated setups with different robot placements, obstacles, and target positions.

How it works: RoboBallet is a graph neural network that takes as input positions and orientations of robots, obstacles, and targets and generates joint velocities for each arm from its current position to reach a target. The authors trained it entirely in simulation using the TD3 actor-critic algorithm, a reinforcement learning algorithm. They generated about 1 million simulated workspaces, each of which contained a team of 4 or 8 simulated 3-joint Franka Panda robotic arms attached to the sides of a table at random, 30 obstacle blocks placed at random, and 40 target positions/orientations per team. They rejected configurations that started in collision. ...

During training, every 100 milliseconds, the model selected joint velocities of all robots, effectively telling each arm how to move (the actor role in the actor-critic learning algorithm). In parallel, it evaluated how good each prediction was – that is, how much total reward the current action and all actions likely to follow would yield (the critic role).

The authors rewarded the model for arms that touched the target positions and penalized collisions. Because the arms rarely touched the target positions, they used Hindsight Experience Replay, a method that turns failed attempts into useful examples by treating points that the arm reached accidentally as intended goals. The loss encouraged the actor to produce actions that the critic predicted would lead to higher long-term rewards. This helped the model learn to prefer actions that paid off over time rather than maximize immediate rewards. ...

Given new work spaces, the model generated collision-free trajectories for up to 8 Franka Panda robotic arms.

RoboBallet effectively parallelized work. Average time to move robots to 20 target positions dropped from 7.5 seconds with 4 arms to 4.3 seconds with 8 arms.

In a simplified benchmark with four robots and 20 target positions, RoboBallet produced trajectories as quickly as the best hand-optimized baselines, reaching all target poses in the same range of 8 to 11 seconds. ..."

From the abstract:

"Modern robotic manufacturing requires collision-free coordination of multiple robots to complete numerous tasks in shared, obstacle-rich workspaces. Although individual tasks may be simple in isolation, automated joint task allocation, scheduling, and motion planning under spatiotemporal constraints remain computationally intractable for classical methods at real-world scales. Existing multiarm systems deployed in industry rely on human intuition and experience to design feasible trajectories manually in a labor-intensive process.

To address this challenge, we propose a reinforcement learning (RL) framework to achieve automated task and motion planning, tested in an obstacle-rich environment with eight robots performing 40 reaching tasks in a shared workspace, where any robot can perform any task in any order.

Our approach builds on a graph neural network (GNN) policy trained via RL on procedurally generated environments with diverse obstacle layouts, robot configurations, and task distributions.

It uses a graph representation of scenes and a graph policy neural network trained through RL to generate trajectories of multiple robots, jointly solving the subproblems of task allocation, scheduling, and motion planning.

Trained on large randomly generated task sets in simulation, our policy generalizes zero-shot to unseen settings with varying robot placements, obstacle geometries, and task poses.

We further demonstrate that the high-speed capability of our solution enables its use in workcell layout optimization, improving solution times.

The speed and scalability of our planner also open the door to capabilities such as fault-tolerant planning and online perception-based replanning, where rapid adaptation to dynamic task sets is required."

Meta’s Open 3D Pipeline, World Labs’ Virtual Spaces, Baidu’s Multimodal Models, Coordinating Robot Teams

RoboBallet: Planning for multirobot reaching with graph neural networks and reinforcement learning (no public access)

RoboBallet: Planning for Multi-Robot Reaching with Graph Neural Networks and Reinforcement Learning (preprint, open access)