Stumped by a Lego set? A new machine learning framework can interpret those instructions for you.
Researchers at Stanford University, MIT’s Computer Science and Artificial Intelligence Lab, and the Autodesk AI Lab have collaborated to develop a novel learning-based framework that can interpret 2D instructions to build 3D objects.
The Manual-to-Executable-Plan Network, or MEPNet, was tested on computer-generated Lego sets, real Lego set instructions and Minecraft-style voxel building plans, and the researchers said it outperformed existing methods across the board.
MEPNet’s novel idea
Interpreting 2D instructions isn’t easy for artificial intelligence. The researchers said there are a couple key problems going from visual instructions that, like Lego sets, consist entirely of images: Identifying correspondence between 2D and 3D objects, and dealing with a lot of basic pieces, like Legos.
Basic Lego bricks, the researchers said, are often assembled into complex forms before being added to the main body of the model. This “increases the difficulty for machines to interpret Lego manuals: it requires inferring 3D poses of unseen objects composed of seen primitives,” the researchers said.
Existing methods of parsing manual steps into machine-executable plans mainly consist of two forms, the researchers said: Search-based methods that are simple and accurate but computationally expensive; and learning-based models that are fast but aren’t very good at handling unseen 3D shapes.
MEPNet, the researchers said, combines both.
Starting with a 3D model of the components, the current state of the Lego set, and 2D manual images, MEPNet “predicts a set of 2D keypoints and masks for each component,” the researchers wrote.
Once that’s done, the 2D keypoints “are back-projected to 3D by finding possible connections between the base shape and the new components.” The combination “maintains the efficiency of learning-based models, and generalizes better to unseen 3D components,” the team wrote.
But can it build my Ikea dresser?
In the paper, the researchers said their aim is to create machines that help people assemble complex objects, and they include furniture alongside Lego bricks and voxel worlds in their list of applications.
We’ve asked the researchers behind MEPNet about more potential uses of their new framework, but haven’t heard back yet. In the meantime, it might be reasonable to assume MEPNet could build a bookshelf – at least virtually – given the necessary library of components and instructions.
All a human would have to do would be to interpret MEPNet’s 3D renderings, which would hopefully be easier than flat-pack furniture instructions.
Those who want to test MEPNet, and are familiar with Pytorch, can find its code on Github. ®