MacOS Agent: an efficient Computer Use Agent for MacOS

BIGAI ML Group

Meet the Automation of Computer Use

We build a Computer Use Agent that redefines how you interact with macOS. Now it can solve diverse and complex tasks across commonly used applications.

👁️

Multimodal Understanding

Interprets screen content and task requirements to provide clear progress tracking and next steps.

🔄

Long-horizon Reasoning

Executes multi-step tasks through natural interactions with the macOS interface.

🧠

Cross-App Workflows

Coordinates tasks across different applications with consistent performance and reliability.

This is a fully open-source agent, including the code, prompt, and subsequent data and training code.

100% Open SourceResearch-FriendlymacOS Compatible

System Architecture

Our Computer Use Agent employs a hierarchical structure that seamlessly connects human intent with application-specific actions:

ImplementedIn DevelopmentHUMANComputerUse AgentMacAgentWindowsAgentiOSAgentxxxAgentPRODUCTIVITY SUITESYSTEMMEDIA & COMMUNICATIONWordExcelPowerPointTextEditFinderPreviewPlayerBrowserWeChatCalendarUser LevelOS LevelApp Level

MacOS Agent is a hierarchical multi-agent system with three key components:

  • ComputerUse Agent: The top-level agent that interfaces with human users. It receives natural language instructions and generates high-level execution plans, which are then forwarded to the MacAgent. This agent operates at an abstract level without direct access to application controls.
  • MacAgent: The central coordinator that receives plans from the ComputerUse Agent. It analyzes these plans and determines the optimal execution strategy by:
    • Identifying which app agents are needed
    • Generating executable code to orchestrate these agents
    • Managing the reactive workflow between different app agents
  • App Agents: A collection of nine specialized agents that directly interface with macOS applications:
    • Document Processing: Word Agent, TextEdit Agent
    • Data & Presentations: Excel Agent, PowerPoint Agent
    • System & Navigation: Finder Agent, Browser Agent
    • Media & Communication: Preview Agent, QuickTime Agent, WeChat Agent, Calendar Agent

This architecture enables sophisticated task execution through coordinated agent interactions. The MacAgent's ability to dynamically orchestrate app agents allows for complex workflows while maintaining clear separation of concerns between planning and execution.

Involved Apps: Browser, Word, Calendar.

Trace Monitoring
3/20/2024, 10:00:00 AM0s - 18s
Screenshot at 2024-03-20T10:00:00Z

Agent's Thought:

The agent uses the browser app to find information about the artist and their work that won the Grammy Award for Album of the Year in 2025.

Action:

Start browser again to search for more information about the artist and her work.

3/20/2024, 10:02:30 AM18s - 29s
Screenshot at 2024-03-20T10:02:30Z

Agent's Thought:

Agent uses browser app to gather more information.

Action:

Enough information is gathered, the agent moves on to create the docx file for introduction.

3/20/2024, 10:05:15 AM29s - 35s
Screenshot at 2024-03-20T10:05:15Z

Agent's Thought:

Agent writes the introduction in the word file.

Action:

Agent moves on to add the most recent concert date on calendar.

3/20/2024, 10:05:15 AM35s - 45s
Screenshot at 2024-03-20T10:05:15Z

Agent's Thought:

Agent uses the calendar app to add the most recent concert date.

Action:

Task completed.

当前时间: 0.00 | 显示轨迹项: #1

Involved Apps: Browser, TextEdit, Calendar.

Trace Monitoring
3/20/2024, 10:00:00 AM0s - 3s
Screenshot at 2024-03-20T10:00:00Z

Agent's Thought:

The agent opens the calendar to check any schedule for this friday.

Action:

The agent finds there's indeed a movie night schedule.

3/20/2024, 10:02:30 AM3s - 8s
Screenshot at 2024-03-20T10:02:30Z

Agent's Thought:

Agent uses browser app to gather more information about latest movies.

Action:

Enough information is gathered, the agent moves on to create the text file for movie recommendation.

3/20/2024, 10:05:15 AM8s - 10s
Screenshot at 2024-03-20T10:05:15Z

Agent's Thought:

Agent writes the recommendation in the text file.

Action:

Task completed.

当前时间: 0.00 | 显示轨迹项: #1

Involved Apps: Browser, Word.

Trace Monitoring
4/23/2025, 10:00:00 AM0s - 12s
Screenshot at 2025-04-23T10:00:00Z

Agent's Thought:

To complete the task,I should first search for the blog 'The Second Half' written by Shunyu Yao. I will use a browser agent to find relevant information about the blog.

Action:

Start browser to search for information about 'The Second Half' written by Shunyu Yao

4/23/2025, 10:00:30 AM12s - 13s
Screenshot at 2025-04-23T10:00:30Z

Agent's Thought:

I have found the blog 'The Second Half' written by Shunyu Yao. The blog discusses the author's insights on AI area. I will now summarize the content of the blog.

Action:

Sumarize the blog content

4/23/2025, 10:01:10 AM13s - 28s
Screenshot at 2025-04-23T10:01:10Z

Agent's Thought:

Now I get the summary of the blog.To complete the task, I should open Microsoft Word and type the summary.

Action:

Open Microsoft Word and type the summary

当前时间: 0.00 | 显示轨迹项: #1

Involved Apps: Browser, Calendar.

Trace Monitoring
3/20/2024, 10:00:00 AM0s - 7s
Screenshot at 2024-03-20T10:00:00Z

Agent's Thought:

Agent identifies that the user is trying to locate a conversation with a specific contact named '尚子睿'. By analyzing the context, I infer that the user intends to open WeChat and search through recent chats or use the search function to find the conversation.

Action:

open WeChat

3/20/2024, 10:02:30 AM7s - 11s
Screenshot at 2024-03-20T10:02:30Z

Agent's Thought:

Agent observes that the user's WeChat interface displays the recent conversation list, and a chat with '尚子睿' is clearly visible. I infer that the user intends to access this chat to review previous messages.

Action:

click on the chat with 尚子睿

3/20/2024, 10:05:15 AM11s - 16s
Screenshot at 2024-03-20T10:05:15Z

Agent's Thought:

Agent sees that the user is browsing the chat with '尚子睿'. By scanning the recent message content, I detect references to a Tencent Meeting, which appears to include relevant details such as meeting time and link. The user is likely identifying this information to use it for scheduling.

Action:

read the recent messages to extract Tencent Meeting details

3/20/2024, 10:05:15 AM16s - 23s
Screenshot at 2024-03-20T10:05:15Z

Agent's Thought:

Agent recognizes that the user has obtained the Tencent Meeting start time. The goal now is to create a new event in the calendar at the specified time to ensure the meeting is properly scheduled.

Action:

add the Tencent Meeting start time to the calendar

当前时间: 0.00 | 显示轨迹项: #1

Involved Apps: WeChat, Calendar.

Trace Monitoring
4/19/2024, 10:00:00 AM0s - 5s
Screenshot at 2024-04-19T10:00:00Z

Agent's Thought:

Agent observes that the user is searching for the weather forecast for the upcoming weekend in Beijing. The user is likely interested in planning an outdoor activity for the weekend.

Action:

search for Beijing weather forecast for April 19 and 20

4/19/2024, 11:00:00 AM5s - 10s
Screenshot at 2024-04-19T11:00:00Z

Agent's Thought:

Agent sees that the forecast for April 19 and 20 shows cloudy weather, which may not be ideal for the user's intended outdoor plans. The user is likely considering searching for better weather next weekend.

Action:

search for Beijing weather forecast for April 25 to April 28

4/19/2024, 12:00:00 PM10s - 15s
Screenshot at 2024-04-19T12:00:00Z

Agent's Thought:

Agent observes that the forecast for April 27 is sunny, which makes it ideal for outdoor activities. Based on this observation, the user seems to have decided to schedule a trip for this date.

Action:

open calendar and add a travel schedule on April 27 for 'Trip to Beijing'

当前时间: 0.00 | 显示轨迹项: #1

Involved Apps: WeChat, Word, Finder.

Trace Monitoring
3/21/2024, 10:00:00 AM0s - 5s
Screenshot at 2024-03-21T10:00:00Z

Agent's Thought:

Agent determines that the user intends to communicate with '尚子睿' via WeChat. The user navigates through the chat list to locate the conversation and initiate interaction.

Action:

open WeChat and locate the chat with 尚子睿

3/21/2024, 11:00:00 AM5s - 7s
Screenshot at 2024-03-21T11:00:00Z

Agent's Thought:

Agent observes that the user is reading recent messages from '尚子睿' which contain specific instructions. These instructions include file editing tasks that need to be followed precisely.

Action:

read the message from 尚子睿 to understand the file modification request

3/21/2024, 12:00:00 PM7s - 11s
Screenshot at 2024-03-21T12:00:00Z

Agent's Thought:

Agent observes the user has read a detailed instruction message regarding document editing. Based on this, I infer that the next step is to locate and open the specified file to apply the requested formatting.

Action:

open article.docx and apply the required font formatting to the first 20 characters

3/21/2024, 12:00:00 PM11s - 14s
Screenshot at 2024-03-21T12:00:00Z

Agent's Thought:

Agent sees that the user has finished editing the document in Word. To prepare for sending it, the next logical step is to find the file in Finder and copy it.

Action:

copy the modified article.docx file in Finder

3/21/2024, 12:00:00 PM14s - 26s
Screenshot at 2024-03-21T12:00:00Z

Agent's Thought:

Agent observes that the user has copied the edited file. To deliver it to 尚子睿 as requested, the user now needs to return to the WeChat conversation and paste the file into the chat.

Action:

paste and send the modified file to 尚子睿 via WeChat

当前时间: 0.00 | 显示轨迹项: #1