AutoGLM: Autonomous Foundation Agents for GUIs

NEW 2025.12.8 Open-AutoGLM Released - Open-Source Phone Agent with Downloadable Model

Xiao Liu12, Bo Qin1†, Dongzhu Liang1†, Guang Dong1†, Hanyu Lai12*†, Hanchen Zhang12*†,
Hanlin Zhao1†, Iat Long Iong12*†, Jiadai Sun1†, Jiaqi Wang1†, Junjie Gao1†, Junjun Shan1†,
Kangning Liu1†, Shudan Zhang12*†, Shuntian Yao1*†, Siyi Cheng1*†, Wentao Yao12*†,
Wenyi Zhao1†, Xinghan Liu12*†, Xinyi Liu1†, Xinying Chen1†, Xinyue Yang1†, Yang Yang1†,
Yifan Xu12*†, Yu Yang1†, Yujia Wang1†, Yulin Xu1†, Zehan Qi12*†, Yuxiao Dong2, Jie Tang2
Z.AI1 Tsinghua University2
* Work done while these authors interned at Z.AI.
These authors are listed alphabetically by first names.
Open-AutoGLM is an open-source phone agent framework built on ChatGLM family. It enables autonomous task completion on Android devices through natural language commands. The model AutoGLM-Phone-9B is now publicly available for download on Hugging Face and ModelScope. Supports mainstream apps in English & Chinese, including Gmail, Google Maps, WeChat, Taobao, Meituan, and more.

Open-AutoGLM Demo 2025.12.8

Open-source phone agent framework demonstration

(a) Find the top-rated cinema nearby and navigate me there by foot on Google Maps

(b) Like the top 3 posts from Andrej Karpathy on X app, and summarize them to me

Open-AutoGLM Features

Fully Open Source

Complete framework and model weights available. Deploy locally with full control.

Mainstream Apps

Works with popular apps in English & Chinese: Gmail, Google Maps, WeChat, Taobao, and more.

Multimodal Understanding

Visually perceives screen content and intelligently plans actions to complete tasks.

Simple Python API

Easy-to-use API for integration. Just a few lines to automate phone tasks.

Remote ADB Support

Control devices over WiFi. No USB cable required after initial setup.

Safe Operations

Built-in confirmation for sensitive actions. Human takeover for login/captcha.

AutoGLM Initial Release 2024.10.28

Original AutoGLM demonstration videos (integrated product version)

(a) AutoGLM demonstration on Phone (integrated version).


(b) AutoGLM demonstration on Web (integrated version).

Abstract (AutoGLM Paper)

We present AutoGLM, a new series in the ChatGLM family~\cite{glm2024chatglm}, designed to serve as foundation agents for autonomous control of digital devices through Graphical User Interfaces (GUIs). While foundation models excel at acquiring human knowledge, they often struggle with decision-making in dynamic real-world environments, limiting their progress toward artificial general intelligence. This limitation underscores the importance of developing foundation agents capable of learning through autonomous environmental interactions by reinforcing existing models. Focusing on Web Browser and Android as representative GUI scenarios, we have developed AutoGLM as a practical foundation agent system for real-world GUI interactions. Our approach integrates a comprehensive suite of techniques and infrastructures to create deployable agent systems suitable for user delivery. Through this development, we have derived two key insights: First, the design of an appropriate "intermediate interface" for GUI control is crucial, enabling the separation of planning and grounding behaviors, which require distinct optimization for flexibility and accuracy respectively. Second, we have developed a novel progressive training framework that enables self-evolving online curriculum reinforcement learning with AutoGLM. Our evaluations demonstrate AutoGLM's effectiveness across multiple domains. For web browsing, AutoGLM achieves a 55.2\% success rate on VAB-WebArena-Lite (improving to 59.1\% with a second attempt) and 96.2\% on OpenTable evaluation tasks. In Android device control, AutoGLM attains a 36.2\% success rate on AndroidLab (VAB-Mobile) and 89.7\% on common tasks in popular Chinese APPs.

Phone Use (Real Speed Recording)


(a) [Gmail] Write an email querying about the project progress with subject hi to harry66@gmail.com, scheduled to send on Oct.30 8:00 AM

(b) [Google Maps] Find the nearest top rated coffee shop and direct me there on foot

(c) [Temu] Add two paris of top saled running shoes for women of size 7.5 to my cart

(d) [X] Help me find AK's homepage url

(f) 在美团上点一杯瑞幸咖啡的标准美式,半糖

(g) 在大众点评上给全聚德清华科技园店写一个五星好评

(h) 在微信上给老板最近的一条朋友圈点赞,并评论“深有启发”

(i) 在携程上订一家11月5到到10号上海迪士尼附近评价最好的酒店

Web Browser Use (Real Speed Recording)


(a) Secure a table on OpenTable for 2 people at Saffron Fine Indian Cuisine on Nov.6 2024 at 7:30 PM?

(b) Check my issues and create an issue called "excellent engineer wanted" for project Zhipu AI on GitLab.

(c) Show me the "chairs"listings by ascending price on OneStopShop.

(d) Reserve for my parents and I at Megan's Kitchen on Oct. 23, 2024 7:30 PM

(e) Set all reviews with keyword "sweet" to approved on Client Management System.

(f) Get durations to first drive from MIT to Harvard, and then from Harvard to Boston Airport

(g) 在小红书上,帮我找找热度最高的罗马旅游的图文攻略,并特别总结一下提到了哪些必去的景点

(h) 总结一下 deepspeed 有哪些节省显存的策略,参考最多赞同的文章

(i) 检索知识图谱最新的学术期刊发表内容,只看北大核心

Citation

If you find our work helpful, please cite the following papers:

@article{liu2024autoglm,
  title={Autoglm: Autonomous foundation agents for guis},
  author={Liu, Xiao and Qin, Bo and Liang, Dongzhu and Dong, Guang and Lai, Hanyu and Zhang, Hanchen and Zhao, Hanlin and Iong, Iat Long and Sun, Jiadai and Wang, Jiaqi and others},
  journal={arXiv preprint arXiv:2411.00820},
  year={2024}
}

@article{xu2025mobilerl,
  title={MobileRL: Online Agentic Reinforcement Learning for Mobile GUI Agents},
  author={Xu, Yifan and Liu, Xiao and Liu, Xinghan and Fu, Jiaqi and Zhang, Hanchen and Jing, Bohao and Zhang, Shudan and Wang, Yuting and Zhao, Wenyi and Dong, Yuxiao},
  journal={arXiv preprint arXiv:2509.18119},
  year={2025}
}

@article{zhang2025agentrl,
  title={AgentRL: Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework},
  author={Zhang, Hanchen and Liu, Xiao and Lv, Bowen and Sun, Xueqiao and Jing, Bohao and Iong, Iat Long and Hou, Zhenyu and Qi, Zehan and Lai, Hanyu and Xu, Yifan and others},
  journal={arXiv preprint arXiv:2510.04206},
  year={2025}
}

@article{lai2025computerrl,
  title={Computerrl: Scaling end-to-end online reinforcement learning for computer use agents},
  author={Lai, Hanyu and Liu, Xiao and Zhao, Yanxiao and Xu, Han and Zhang, Hanchen and Jing, Bohao and Ren, Yanyu and Yao, Shuntian and Dong, Yuxiao and Tang, Jie},
  journal={arXiv preprint arXiv:2508.14040},
  year={2025}
}