when IEEE spectrumI first wrote about Covariant in 2020. It was a new robotics startup looking to apply robotics to warehouse picking at scale through the magic of a single end-to-end neural network. At the time, Covariant was focused on this picking use case. That’s because it’s an application that can provide immediate value, where warehouse companies pay his Covariant for robots to pick items within their warehouses. But the exciting part for Covariant is that over the past four years, warehouse picking has generated a ton of real-world operational data, and you can probably guess where this is going.
Today, Covariant is announcing RFM-1. The company describes it as a robotics foundation model that gives robots “human-like reasoning abilities.” This is from a press release, and I’m not necessarily into reading too much into “humanity” or “rationality,” but what Covariant is doing here is pretty cool.
“Fundamental model” means RFM-1 is trained with more data and can do more. For now, it’s all about warehouse operations since that’s what it’s trained on, but you can extend its functionality by feeding it more data. “Our existing systems are already sufficiently capable to perform very fast and variable pick-and-place,” said Covariant co-founder Pieter Abbeel. “But we’re taking it even further now. Whatever the task, whatever its incarnation, that’s the long-term vision. Powering billions of robots around the world. As we speak, Covariant’s business of deploying large fleets of warehouse automation robots relies on the tens of millions of trajectories (tasks) required to train the 8 billion parameter RFM. It was the fastest way to collect information (how the robot moves during the test). -1 model.
“The only way we can do what we’re doing is by deploying robots all over the world collecting massive amounts of data,” Abbeer says. “This allows us to train basic robot models with unique capabilities.”
There are other attempts of this kind, and the RTX project is one recent example. But while RT-X relies on research institutions to share what data to create datasets large enough to be useful, Covariant can do that alone thanks to its fleet of warehouse robots. We are doing “RT-X has data for about a million trajectories, but we can exceed that because we can retrieve a million trajectories every few weeks,” Abbeel said. Masu.
“By building a valuable picking robot that is deployed by dozens of customers in 15 countries, we are essentially creating a data collection machine.” —Pieter Abbeel, Covariance
The current implementation of RFM-1 can be thought of as a predictive engine for suction-based object manipulation in warehouse environments. The model incorporates everything involved in the kind of robotic operations Covariant performs, including still images, video, joint angles, force readings, and suction cup strength. These are all interconnected within RFM-1. This means that if you put any of these into one end of RFM-1, you will get a prediction from the other end of the model. That prediction can be in the form of images, videos, or a series of commands to the robot.
What’s important to understand about all of this is that RFM-1 isn’t limited to only selecting things it’s seen before, or only working with robots with which it has first-hand experience. This is the beauty of the basic model. Because the underlying model can generalize within the training data and does not need to be retrained for each new picking robot or new item, Covariant has been able to scale its business as successfully as ever. . The counterintuitive thing about these large models is that they are actually better at dealing with new situations than the trained models. in particular Be prepared for such situations.
For example, let’s say you want to train a model to drive a car on a highway. The question, Abbil said, is whether it’s worth the time to train for other types of driving anyway. The answer is yes. Because highway driving is sometimes dangerous. do not have Highway driving. Accidents and rush hour traffic may occur and require unusual driving. If you’re also training for city driving, you’re effectively training for highway edge cases as well, which will come in handy someday and improve your overall performance. The same idea applies to RFM-1. Training on different types of operations (different robots, different objects, etc.) will only increase your ability to perform a single type of operation.
In the context of generalization, Covariant speaks of RFM-1’s ability to “understand” its environment. This may be a difficult term when it comes to AI, but the key is to ingrain the meaning of “understanding” into RFM-1’s capabilities. For example, you don’t have to do: to understand You need physics to be able to catch a baseball, but you also need a lot of experience catching baseballs. That’s what RFM-1 is for.you could Also Even if you have no baseball experience, if you understand the physics and figure out how to catch a baseball, the RFM-1 will work. do not have That’s why I hesitate to use the word “understand” in this context.
But this brings us to another interesting feature of RFM-1. That means that even when constrained, it operates as a very effective simulation tool. As a prediction engine that outputs video, you can ask it to generate what the next few seconds of an action sequence will look like. Get realistic and accurate results based on all your data. Importantly, RFM-1 can effectively simulate objects that are traditionally difficult to simulate, such as floppy objects.
Covariant’s Abbeel explains that the “world model” underlying RFM-1’s predictions is effectively a learned physics engine. “Building a physics engine turned out to be a very difficult task that really covered everything that could happen in the world,” Abeer says. “When you create a complex scenario, it becomes very imprecise and quickly becomes inaccurate because you have to make all sorts of approximations to run the physics engine on the computer.” are running a large-scale data version of this using a world model and are getting very good results.”
Abbeer gives the example of having a robot simulate (or predict) what would happen if a cylinder were placed vertically on a conveyor belt. This prediction shows exactly how the cylinder will topple and roll once the belt starts moving. This is not because the cylinders are simulated, but because RFM-1 observes many conveyors with many objects placed on his belt.
“Five years from now, it’s not unlikely that what we’re building here will be the only type of simulator that anyone uses.” —Pieter Abbeel, Covariance
This only works if you have the right kind of data to train RFM-1, so unlike most simulation environments, it cannot currently be generalized to completely new objects or situations. But Abbeer believes that with enough data, useful world simulations can be made. “Five years from now, it’s not unlikely that what we’re building here will be the only type of simulator that anyone uses. Incorporating all this into a physics engine in any way is extremely difficult, let alone a renderer to make it look like it in the real world. We are taking shortcuts.”
As Covariant expands RFM-1’s capabilities toward its long-term vision of a foundational model that powers “billions of robots around the world,” the next step will be to expand RFM-1’s capabilities from a variety of robots performing different tasks. It’s about providing a lot of data. . “We basically built a data ingestion engine,” he says Abbeel. “If you provide us with other types of data, we’ll incorporate that as well.”
“We have great confidence that this kind of model can power all kinds of robots, perhaps as we have more data about the types of robots and the kinds of situations in which they can be used. .” —Pieter Abbeel, Covariance
Either way, that path will involve an enormous amount of data that Covariant doesn’t currently collect using its fleet of warehouse-operating robots. So, for example, if you’re a humanoid robot company, what’s your motivation for sharing all the data you collect with Covariant? “The pitch is, we help them get to the real world.” said Peter Chen, co-founder of Covariant. “I don’t think there are too many companies that are deploying AI to make robots truly autonomous in production environments. They want AI that is robust and powerful and can actually help them get into the real world. Then we are your best option.”
Covariant’s central claim here is that while it is certainly possible for any robotics company to independently train its own models, its performance, at least for those attempting to operate it, is that Covariant’s RFM- Operation data that you already have in 1. “It has been our long-term plan to become a basic model company in robotics,” says Chen. “We didn’t have enough data, compute, or algorithms to get to this point, but building a universal AI platform for robots has been Covariant’s goal from the beginning.”
From an article on your site
Related articles on the web