Large Action Models - The Latest New AI Term

Feb 6

Feb 6 Large Action Models - The Latest New AI Term

artifical intelligence, article, 2024

It seems like just about every week a new AI term takes off. These past few weeks were no different with the rise of “Large Action Models”. The term was catapulted into the news after a company named “Rabbit” launched the Rabbit r1 personal AI assistant device at CES 2024.

In introducing the device, Rabbit highlights the difficulties which have arisen from a silo’d, app-driven world. They explain that with the advancements in natural language processing and their “neuro-symbolic programming to directly learn user interactions with applications”, we can move towards a personal device experience where actions are seamlessly achieved across application silo’s initiated by human voice or text requests.

Example “Large Action Model” Workflows

We’ve all been there, trying to get a simple task done but stuck going from app to app to stitch it all together. Sometimes we are copying and pasting information across apps, sometimes we’re trying to remember a code, date or name. In any scenario, us humans are doing the glue work. In this respect, I see the benefit of a model which can be trained to understand and act upon human workflow patterns. Below are a few concrete examples:

- A user could create a “Get ready for my trip” workflow which could include searching their email for the flight information, checking into the flight and booking a ride to the airport (cross checking Uber and Lyft)

- A user could create a “RSVP to a baby shower” workflow which might include finding the email invite, RSVPing yes with a note, checking the registry for purchasable items and buying the gift.

Rabbit r1 device introduction at CES 2024

How This Works In a Nutshell

The Limitations of LLMs

While they have not published the specifics of their LAM, they have shared the basic premise on their research page. The idea is that while LLMs great at explaining things, they don’t take action. If you look at the above workflows, and asked them how to achieve the task at hand, they are likely able to guide you through the work via text. This in itself is impressive, and helpful, but not as helpful as it could be. The issue is that LLMs are inherently passive.

OK, Then Let’s Make Our AI Active…

A common approach to take our AI from passive to active is through the use of agents. Agents can be implemented in a variety of ways, but the basic premise is that they can understand a human prompt and take action accordingly, including chained actions.

“AI agents leverage large language models like GPT-3 to understand goals, generate tasks, and go about completing them.”

— Zapier

While agents are a very effective and impactful way to get things done today, the issue is that they rely on using LLMs to translate human asks into to API requests. Though this is an incredibly common and effective business pattern for higher volume workloads, it does not account for the wide range of consumer patterns we display on our personal devices. The issue is that this pattern requires all workflows to have an API that can be called to execute the action. Unfortunately, the majority of consumer features do not offer full parity API functionality to their user interface.

Enter Large Action Models

If our programatic interfaces don’t give us the coverage we need, and our user interfaces are not easily (and robustly) programmable, then we as users are stuck stitching together the user interfaces.

And this is right where Rabbit comes in with their symbolic model. Rabbit AI claims to understand user intent as displayed by the action on the screen. Meaning, that the algorithm is trained on user intent outcomes by high level user workflows vs fragile UI elements. Further they claim this can by done through training each workflow only once.

Simply Put

While Large Action Models is not an agreed upon industry wide term, it is generating a great deal of interest referencing the ability to take LLM interfaces from passive to active without the reliance on API based agents. In our example, we covered the Rabbit r1 device which appears to have a novel semantic model, mapping human intention to action without relying on known UI configuration. This LAM paradigm, if successful, can offer very impactful consumer UI assistance.

What’s Next

Most of the recent press surrounding “Large Action Models” has been prompted by the Rabbit device and focuses on their “neuro-symbolic model” to address limitations around API based agents. However, some articles are already linking the “Large Action Models” concepts to “Large Agent Models”. It is possible that Large Action Models simply becomes a term for language models which are able to take action via a variety of interfaces.

Beyond the evolution of the “Large Action Model” term, I’m also curious to see how well the Rabbit model actually functions. I’ve submitted my pre-order and my device batch is set to ship in June, so I’ll keep y’all posted!

While LAMs have not been written about in volume, below are a few articles and podcasts I would suggest to continue your learning on the subject!