This New Breed of AI Assistant Wants to Do Your Boring Office Chores

An experimental AI helper attempts to operate a web browser in the same way a human does to take on office admin like processing invoices or screening job applicants.
Collage showing multiple robotic hands reaching out to cursors around a desktop.
PHOTO-ILLUSTRATION: ANJALI NAIR; GETTY IMAGES

This week, OpenAI announced a service that makes it possible for just about anyone to build a custom version of ChatGPT, no coding skills required. The company suggests that users may want to build a bot that knows the rules of all board games, teaches kids about math, or can offer culinary advice. These GPTs, as OpenAI calls them, can also perform simple actions by connecting with internet services, for example searching through emails or ordering products from an online store.

You can’t fault OpenAI for trying to build on the success of its smash hit ChatGPT. But maybe more chatbots is not what we need?

Adept AI, a startup in San Francisco founded by veterans of OpenAI, Google, and DeepMind, is today launching an experimental AI agent that automates common chores in a more sophisticated and potentially powerful way than chatbots like ChatGPT. Instead of being limited to using online services that provide APIs to make them accessible to software, ACT-2 attempts to use a computer more like a human—by making sense of the pixels on a display and then taking action to control a browser and online services.

Adept’s demos show how ACT-2 can be used to do things like gathering info from emails and documents to fill out insurance claims, inputting information from emailed invoices into accounts-payable software, and coming up with a walking tour for a city by interacting with Google Maps.

The way ACT-2 attempts to use the same user interfaces that humans do promises to make it a lot more capable and expansive. In theory that approach could allow a chatbot to do literally anything a person might do on their phone or computer. But operating that way is also more challenging for algorithms, and for now makes the agent more error prone.

Under the hood, ACT-2 uses a large language model called Fuyu. It is similar to the one that powers many chatbots, but like ChatGPT it can handle both text and images (making it a “multimodal model”). The model analyzes what it sees on a computer screen and tries to translate the request a user typed into useful actions the bot should take. Adept uses reinforcement learning—a technique used to teach computers tasks including playing board games and video games—to instruct its AI on how to perform different tasks. This involves watching lots of humans perform specific tasks and trying to achieve similar performance for itself.

David Luan, founder and CEO of Adept and previously VP of engineering at OpenAI, says that while chatbots have wowed everyone with their capabilities, it has proven challenging to get AI agents to work reliably. But he believes Adept and others are getting a lot closer to solving that.

“This year they just weren’t there,” Luan says of today’s agents, including his own. “I think what's going to happen is next year there's going to be a giant war around agents that actually work.” Adept is initially designing its agents to perform only a limited number of simple but common office tasks, and it says they are now at least 95 percent reliable, which is sufficient for them to be commercially deployed at a few companies.

Reaching that level of reliability just for the initial, limited tasks ACT-2 is designed for is a major breakthrough. For years, tools have existed to automate office tasks—what’s known as robotic process automation—but these are finicky to build and prone to breaking. If Adept and others can use AI to reliably automate a lot more tasks, it could transform office work and increase productivity.

If Luan is right, then the battle to automate your most tedious chores could make the chatbot wars of 2023 seem relatively tame.