Project | ActGPT

Duration: 04/01/2025 - 03/31/2028

Adaptive robot ConTrol with Generative Pre-trained Transformers

Research Topics

Application fields

Industrie 4.0

Currently, we are seeing an explosion in AI, primarily led by the advances in reinforcement learning (RL) methods as well as transformer-based neural networks. For example, large language models (LLMs) as used by ChatGPT have shown impressive results in general-purpose language generation. However, AI should not only satisfy communication intelligence, but also intelligence in terms of interacting with the physical world, as required, e.g., by dynamic robots like humanoids. Yet, current LLMs and other large AI models hardly involve physical interaction with the environment. Conversely, recent advancements in robotics have enabled a new generation of highly dynamic robots which have demonstrated impressive feats of dynamic or even athletic behavior. Most notable of them is the Atlas humanoid robot built by Boston Dynamics which can walk and run naturally, perform 360 degree jumps and back flips, and dance with an agility close to a human. Other examples include ostrich inspired humanoid robot platform Digit (Agility Robotics), the H1 humanoid from Unitree and various quadrupedal robot platforms like MIT mini- Cheetah (Biomimetic Robotics Laboratory, MIT), Alien Go (Unitree), and Vision 60 (Ghost Robotics). While all these systems achieve impressive results in individual, precisely defined tasks, the link between their mo- tion capabilities, which are mostly based on advanced mechanics and modern control theory, and artificial intelligence is usually missing. Thus, ActGPT pursues the following main objective:

To link the predictive capabilities that can be observed in large language models and large multimodal models to the physical capabilities of complex dynamic robots

Such a link between artificial intelligence and the physical capabilities of dynamic robots, which typically require precise system models, opens up several possibilities like reducing the dependence on expert knowledge and manual engineering for generating robot control strategies, increasing the generalization capabilities of highly dynamic robot systems and improving their autonomy in dynamically changing environments. To achieve the main objective, we pursue three sub-goals in ActGPT:

To enable large AI models, primarily transformer networks, to generate dynamic robot motions using natural language and images as input. 2 To enable large AI models to synthesize optimal control (OC) problems using natural language and images as input. 3 To improve the robustness and stability of large AI models, which are known to deliver unreliable and error-prone outputs at times.

The ultimate goal of ActGPT is to control a humanoid robot using natural language input, linking high-level commands to dynamic robot motions.