TY - GEN
T1 - Identifying User Goals From UI Trajectories
AU - Berkovitch, Omri
AU - Caduri, Sapir
AU - Kahlon, Noam
AU - Efros, Anatoly
AU - Caciularu, Avi
AU - Dagan, Ido
N1 - Publisher Copyright: © 2025 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2025/5/23
Y1 - 2025/5/23
N2 - Identifying underlying user goals and intents has been recognized as valuable in various personalization-oriented settings, such as personalized agents, improved search responses, advertising, user analytics, and more. In this paper, we propose a new task-goal identification from observed UI trajectories-aiming to infer the user's detailed intentions when performing a task within UI environments. To support this task, we also introduce a novel evaluation methodology designed to assess whether two intent descriptions can be considered paraphrases within a specific UI environment. Furthermore, we demonstrate how this task can leverage datasets designed for the inverse problem of UI automation, utilizing Android and web datasets for our experiments. To benchmark this task, we compare the performance of humans and state-of-the-art models, specifically GPT-4 and Gemini-1.5 Pro, using our proposed metric. The results reveal that both Gemini and GPT underperform relative to human performance, underscoring the challenge of the proposed task and the significant room for improvement. This work highlights the importance of goal identification within UI trajectories, providing a foundation for further exploration and advancement in this area.
AB - Identifying underlying user goals and intents has been recognized as valuable in various personalization-oriented settings, such as personalized agents, improved search responses, advertising, user analytics, and more. In this paper, we propose a new task-goal identification from observed UI trajectories-aiming to infer the user's detailed intentions when performing a task within UI environments. To support this task, we also introduce a novel evaluation methodology designed to assess whether two intent descriptions can be considered paraphrases within a specific UI environment. Furthermore, we demonstrate how this task can leverage datasets designed for the inverse problem of UI automation, utilizing Android and web datasets for our experiments. To benchmark this task, we compare the performance of humans and state-of-the-art models, specifically GPT-4 and Gemini-1.5 Pro, using our proposed metric. The results reveal that both Gemini and GPT underperform relative to human performance, underscoring the challenge of the proposed task and the significant room for improvement. This work highlights the importance of goal identification within UI trajectories, providing a foundation for further exploration and advancement in this area.
KW - Intent Understanding
KW - Multimodal Interaction
KW - Personalized Agents
KW - Proactive Agents
UR - http://www.scopus.com/inward/record.url?scp=105009212332&partnerID=8YFLogxK
U2 - 10.1145/3701716.3717525
DO - 10.1145/3701716.3717525
M3 - منشور من مؤتمر
T3 - WWW Companion 2025 - Companion Proceedings of the ACM Web Conference 2025
SP - 2381
EP - 2390
BT - WWW Companion 2025 - Companion Proceedings of the ACM Web Conference 2025
T2 - 34th ACM Web Conference, WWW Companion 2025
Y2 - 28 April 2025 through 2 May 2025
ER -