MacOS Agent: an efficient Computer Use Agent for MacOS
BIGAI ML Group
Meet the Automation of Computer Use
We build a Computer Use Agent that redefines how you interact with macOS. Now it can solve diverse and complex tasks across commonly used applications.
Multimodal Understanding
Interprets screen content and task requirements to provide clear progress tracking and next steps.
Long-horizon Reasoning
Executes multi-step tasks through natural interactions with the macOS interface.
Cross-App Workflows
Coordinates tasks across different applications with consistent performance and reliability.
This is a fully open-source agent, including the code, prompt, and subsequent data and training code.
System Architecture
Our Computer Use Agent employs a hierarchical structure that seamlessly connects human intent with application-specific actions:
MacOS Agent is a hierarchical multi-agent system with three key components:
- ComputerUse Agent: The top-level agent that interfaces with human users. It receives natural language instructions and generates high-level execution plans, which are then forwarded to the MacAgent. This agent operates at an abstract level without direct access to application controls.
- MacAgent: The central coordinator that receives plans from the ComputerUse Agent. It analyzes these plans and determines the optimal execution strategy by:
- Identifying which app agents are needed
- Generating executable code to orchestrate these agents
- Managing the reactive workflow between different app agents
- App Agents: A collection of nine specialized agents that directly interface with macOS applications:
- Document Processing: Word Agent, TextEdit Agent
- Data & Presentations: Excel Agent, PowerPoint Agent
- System & Navigation: Finder Agent, Browser Agent
- Media & Communication: Preview Agent, QuickTime Agent, WeChat Agent, Calendar Agent
This architecture enables sophisticated task execution through coordinated agent interactions. The MacAgent's ability to dynamically orchestrate app agents allows for complex workflows while maintaining clear separation of concerns between planning and execution.
Involved Apps: Browser, Word, Calendar.

Agent's Thought:
The agent uses the browser app to find information about the artist and their work that won the Grammy Award for Album of the Year in 2025.
Action:
Start browser again to search for more information about the artist and her work.

Agent's Thought:
Agent uses browser app to gather more information.
Action:
Enough information is gathered, the agent moves on to create the docx file for introduction.

Agent's Thought:
Agent writes the introduction in the word file.
Action:
Agent moves on to add the most recent concert date on calendar.

Agent's Thought:
Agent uses the calendar app to add the most recent concert date.
Action:
Task completed.
Involved Apps: Browser, TextEdit, Calendar.

Agent's Thought:
The agent opens the calendar to check any schedule for this friday.
Action:
The agent finds there's indeed a movie night schedule.

Agent's Thought:
Agent uses browser app to gather more information about latest movies.
Action:
Enough information is gathered, the agent moves on to create the text file for movie recommendation.

Agent's Thought:
Agent writes the recommendation in the text file.
Action:
Task completed.
Involved Apps: Browser, Word.

Agent's Thought:
To complete the task,I should first search for the blog 'The Second Half' written by Shunyu Yao. I will use a browser agent to find relevant information about the blog.
Action:
Start browser to search for information about 'The Second Half' written by Shunyu Yao

Agent's Thought:
I have found the blog 'The Second Half' written by Shunyu Yao. The blog discusses the author's insights on AI area. I will now summarize the content of the blog.
Action:
Sumarize the blog content

Agent's Thought:
Now I get the summary of the blog.To complete the task, I should open Microsoft Word and type the summary.
Action:
Open Microsoft Word and type the summary
Involved Apps: Browser, Calendar.

Agent's Thought:
Agent identifies that the user is trying to locate a conversation with a specific contact named '尚子睿'. By analyzing the context, I infer that the user intends to open WeChat and search through recent chats or use the search function to find the conversation.
Action:
open WeChat

Agent's Thought:
Agent observes that the user's WeChat interface displays the recent conversation list, and a chat with '尚子睿' is clearly visible. I infer that the user intends to access this chat to review previous messages.
Action:
click on the chat with 尚子睿

Agent's Thought:
Agent sees that the user is browsing the chat with '尚子睿'. By scanning the recent message content, I detect references to a Tencent Meeting, which appears to include relevant details such as meeting time and link. The user is likely identifying this information to use it for scheduling.
Action:
read the recent messages to extract Tencent Meeting details

Agent's Thought:
Agent recognizes that the user has obtained the Tencent Meeting start time. The goal now is to create a new event in the calendar at the specified time to ensure the meeting is properly scheduled.
Action:
add the Tencent Meeting start time to the calendar
Involved Apps: WeChat, Calendar.

Agent's Thought:
Agent observes that the user is searching for the weather forecast for the upcoming weekend in Beijing. The user is likely interested in planning an outdoor activity for the weekend.
Action:
search for Beijing weather forecast for April 19 and 20

Agent's Thought:
Agent sees that the forecast for April 19 and 20 shows cloudy weather, which may not be ideal for the user's intended outdoor plans. The user is likely considering searching for better weather next weekend.
Action:
search for Beijing weather forecast for April 25 to April 28

Agent's Thought:
Agent observes that the forecast for April 27 is sunny, which makes it ideal for outdoor activities. Based on this observation, the user seems to have decided to schedule a trip for this date.
Action:
open calendar and add a travel schedule on April 27 for 'Trip to Beijing'
Involved Apps: WeChat, Word, Finder.

Agent's Thought:
Agent determines that the user intends to communicate with '尚子睿' via WeChat. The user navigates through the chat list to locate the conversation and initiate interaction.
Action:
open WeChat and locate the chat with 尚子睿

Agent's Thought:
Agent observes that the user is reading recent messages from '尚子睿' which contain specific instructions. These instructions include file editing tasks that need to be followed precisely.
Action:
read the message from 尚子睿 to understand the file modification request

Agent's Thought:
Agent observes the user has read a detailed instruction message regarding document editing. Based on this, I infer that the next step is to locate and open the specified file to apply the requested formatting.
Action:
open article.docx and apply the required font formatting to the first 20 characters

Agent's Thought:
Agent sees that the user has finished editing the document in Word. To prepare for sending it, the next logical step is to find the file in Finder and copy it.
Action:
copy the modified article.docx file in Finder

Agent's Thought:
Agent observes that the user has copied the edited file. To deliver it to 尚子睿 as requested, the user now needs to return to the WeChat conversation and paste the file into the chat.
Action:
paste and send the modified file to 尚子睿 via WeChat