Towards Next-Generation AI Agents for Web Automation with Large Foundation Models (LFMs)
Tell Me More!

About

With the advancement of web techniques, they have become deeply embedded in people's lives, facilitating the completion of numerous work and daily activities. Despite the importance of the web, many tasks performed on it are repetitive and extremely time-consuming, significantly reducing productivity and negatively impacting overall quality of life. The critical role of the web, combined with the significant time and effort required to complete daily web tasks, naturally raises a question: ‘Can a superintelligent AI assistant be developed to automatically handle these repetitive and time-consuming tasks?’ Recently, Large Foundation Models (LFMs), containing billions of parameters, have exhibited human-like language understanding and reasoning capabilities, offering promising opportunities for the development of superintelligent AI assistants. To fully harness the potential of LFMs, WebAgents have emerged to complete complex web tasks according to user instructions, greatly enhancing the convenience of human daily life.

Our Survey Paper: A Survey of WebAgents: Towards Next-Generation AI Agents for Web Automation with Large Foundation Models


TARGET AUDIENCE AND PREREQUISITES FOR THE TUTORIAL

The audience of this tutorial could be college students, researchers in academic institutions, and industrial AI labs who are interested in Large Foundation Models (LFMs) and WebAgents. The audience is expected to have basic knowledge of artificial intelligence, foundation models, and agent techniques. However, this tutorial will be presented at the college junior/senior level so that it can be comfortably followed by academic researchers or industrial practitioners who are interested in this emerging field but not quite familiar with it. After attending this tutorial, the audience is expected to have a comprehensive understanding of WebAgents and obtain some insights about the potential research directions in this field.

Tutorial Syllabus

The topics of this tutorial include (but are not limited to) the following:

  • WebAgents
  • Large Foundation Models
  • Pre-training
  • Fine-tuning
  • Reinforcement Learning
  • Trustworthiness

    The tutorial outline is shown below:

  • Introduction of WebAgents (15 minutes)
  • Architecture of WebAgents and Main Modules (30 minutes)
    • WebAgents architecture overview
    • Perception in WebAgents
    • Planning and Reasoning in WebAgents
    • Execution in WebAgents
  • Coffee Break (20 minutes)
  • Training Approach of WebAgents (30 minutes)
    • Data Used for Training
    • Training Strategies in WebAgents
  • Advanced WebAgents (30 minutes)
    • Safety and Robustness in WebAgents
    • Privacy in WebAgents
    • Generalizability in WebAgents
  • Challenges and Future Directions of WebAgents (15 minutes)
    • Personalized WebAgents
    • Domain-Specific WebAgents
    • Trustworthy WebAgent
    • Dataset and Benchmark of WebAgent
  • Q&A (10 minutes)
  • Organization


    Tutorial TUTORS

    Wenqi Fan

    Assistant Professor

    The Hong Kong Polytechnic University (PolyU)

    Liangbo Ning

    PhD Student

    The Hong Kong Polytechnic University

    Ziran Liang

    PhD Student

    The Hong Kong Polytechnic University

    Zhuohang Jiang

    PhD Student

    The Hong Kong Polytechnic University

    Haohao Qu

    PhD Student

    The Hong Kong Polytechnic University

    Yujuan Ding

    Research Assistant Professor

    The Hong Kong Polytechnic University

    Xiao-yong Wei

    Visiting professor

    The Hong Kong Polytechnic University

    Hui Liu

    Assistant Professor

    Michigan State University

    Philip S. Yu

    Professor

    University of Illinois at Chicago

    Qing Li

    Professor

    The Hong Kong Polytechnic University