ICML 2024 Workshop
Track II - Frontiers in Trustworthy Agents
Future Frontiers in Trustworthy Foundation Model Agents: Environments, Benchmarks, and Solutions
Introduction
The evolution of Artificial Intelligence (AI) shows its increasing sophistication and integration into daily life, advancing from simple automated agents to Multimodal Large Language Models (MLLMs) like GPT-4 / GPT-4o, and ultimately to ethically aligned, trustworthy agents. This progression—from basic agents to advanced, linguistically skilled models and ethical agents—highlights AI's potential to enhance human capabilities and the importance of managing associated risks.
Why organize this track?
Trustworthy evaluation of AI agents should extend beyond assessing the safety, factuality, and robustness of backend models to include reliable and truthful interactions within the entire agent system. Currently, research on the trustworthiness of multi-modal agents is in early stages, hindered by inadequate experimental environments and benchmarks. Key questions and potential threats in trustworthy AI agents remain unclear, necessitating better definitions and forecasting of safety threats. This track aims to encourage researchers to offer constructive solutions and insights into trustworthy AI agents.
Early exploration of trustworthy agents has shown that this research is a brand new field. Meanwhile, significantly different from the traditional trustworthiness of Multimodal Foundation Models (MFMs), nowadays MFM-based agents still lack appropriate environments and benchmarks for further exploration and evaluation. Such difference is mainly reflected in the following desired subject areas of this track:
- More complex environments[1,7]: Agents operate autonomously, using tools [2,3] or interacting with real or simulated environments [4], which may involve extensive external data. In contrast, environments for vanilla MFMs only leverage human-collected or self-instructed data for simulation without complex interaction.
- More diverse benchmarks for trustworthy AI agent evaluation[8]: Evaluating the trustworthiness of multi-modal AI agents requires more diverse benchmarks due to their complex macro-architecture. A comprehensive safety benchmark should assess each module's capability to withstand adversarial attacks and evaluate the overall collaboration among modules against attack procedures. Traditional evaluations focus on input and output, necessitating more advanced and comprehensive assessment methods for agent safety [5].
Research Topics and Subject Areas
For research topics, supported topics are listed (but not limited to) as follows:
For subject areas, recommended types are listed (but not limited to) as follows:
Example Ideas
In order to provide clearer guidance for submissions, we demonstrate some examples regarding the following topics:
Time Schedule (AoE timezone)
Time (Tentative, wish all TODOs complete on time) | Event |
May 28th 00:00 (stage 1 starts) | Challenge Track 3 starts, submission entry opens During stage 1, participants can submit their proposals in the form of research proposal papers. It will be reviewed by an expert committee. |
June 16th 23:59 (stage 1 ends, stage 2 starts) | Submission deadline of track 3, and stage 2 starts. Paper assignment starts, then review all the proposals. |
June 27th 23:59 | End or review submission. At the end of stage 2, we will select the top 5 outstanding workshop proposals as oral workshop papers and provide the winning prizes during the workshop. |
June 30th | Release decisions and send decision notification emails. |
July 4th 23:59 | Camera ready submission deadline. |
July 27th | Date of Workshop. |
After workshop (stage 3) | Stage 3 starts. For outstanding proposals, we will provide essential support and collaboration to create new methodologies proposed in the submissions. |
How Will The Submissions Be Evaluated
Clearly illustrate the motivation. | Your submission should emphasize corresponding motivation, i.e., which limitation affects the trustworthiness of AI agents, or what is the most critical issue to restrict the trustworthiness of AI agents. |
Highlight the significance of proposed platform / benchmark / training and inference framework / novel opinions. | We recommend the authors to highlight the significance (include but not limited to novelty) of proposed ideas and methods. |
Fine-grained. | We encourage the authors to propose more diverse and critical thinking, which can be adopted in various research areas regarding AI agents. |
At an appropriate level of difficulty, transferability and generalizability. | We hope that the proposal does not only focus on a toy project. Instead, we encourage the authors to propose a more insightful and constructive work, which may indicate that newly proposed ideas or solutions are usually more difficult than existing counterparts, but inherently obtain more transferability and generalizability against more various realistic scenarios. |
Whether this work is able to be conducted or not. | We hope that the proposed ideas and methods are practical to implement and produce. Hence if there exists some preliminary experimental results, we recommend demonstrating them in your submission. |
Example Format
I.Title and Abstract
II. Introduction (Proposal Description and Motivation)
- In the introduction section, the most significant thing is to clarify the motivation of your research proposal for trustworthy agents. For example, which phenomenon or observation motivates you to propose this proposal. And what is the intuition behind it?
- Then summarize the description of your proposal. Please Provide a brief, concrete description of the proposal regarding trustworthy AI agents. Is it an environment, dataset, or a detailed application of trustworthy agents? What are the inputs and outputs of your system in the proposal? And what is the goal of your proposal?
- Besides, we recommend the authors to explain how your proposal could impact the research direction of the AI agent community regarding trustworthy agents. In example ideas, we have prepared several categories of research that generally help to assess or reduce the risks. Nevertheless, submissions will be judged according to their relevance to risks from AI agents, not limited to these categories.
III. Related Works (previous attempts regarding AI agents and trustworthy AI)
- In this section, the authors could review previous attempts regarding both AI agents and trustworthy foundation models (e.g, attack, defense, monitoring, and governance). Then explain how your proposal (a system or a benchmark) is similar to or different from previous counterparts. Good research proposals often tie into existing work to gain widespread adoption while inspiring novel future research.
IV. Technical Details(Recommended)
- In this section, the authors should illustrate all the implementation details for your proposal, and prove that the proposal is practical and can be conducted.
- If your proposal focuses on a synthetic environment / platform / playground for future trustworthy research, we recommend to demonstrate the following (but not limit to) details: the numbers and diversity of supporting scenarios,
- If your proposal focuses on a trustworthy benchmark (maybe newly constructed or based on an existing agent environment), we recommend to illustrate comprehensive analysis (e.g., leaderboard) for existing agent solutions / applications.
V. Preliminary or Major Experimental Results (Optional, if available)
- If you have tried your proposed proposals for trustworthy agents, we strongly encourage you to show the key preliminary or major results of your proposed agent environment / benchmark / solution.
- Both quantitative results and qualitative results are encouraged to demonstrate. For trustworthy AI agent applications / solutions, one can provide quantitative results as comparison to show the effectiveness of proposed methods. For trustworthy agent benchmarks, one can show some quantitative results for existing baseline methods. And for trustworthy agent environments, one can illustrate critical qualitative results to show the effectiveness of the proposed environments.
VI.Conclusion and Relevance to Future Work
- Current tractability analysis. Benchmarks should currently or soon be tractable for existing models while posing a meaningful challenge. A significant research effort should be required to achieve near-maximum performance.
- Performance ceiling. Provide an estimate of maximum performance. For example, what would expert human-level performance be? Is it possible to achieve superhuman performance?
- Barriers to entry. List factors that might make it harder for researchers to use your benchmark. Keep in mind that if barriers-to-entry trade-off is against relevance, you should generally prioritize the latter.
1.How large do models need to be to perform well?
2.How much context is required to understand the task?
3.How difficult is it to adapt current training architectures to the dataset?
4.Is third-party software (e.g. games, modeling software, simulators) or a complex set of dependencies required for training and/or evaluation?
5.Is unusual hardware required (e.g. robotics, multi-GPU training setups)?
6.Do researchers need to learn a new program or programming language to use the dataset (e.g., Coq, AnyLogic)?
VII. Reference
- as usual.
Submission Guideline
Authors are invited to submit short papers with up to 5 pages, but an unlimited number of pages for references and appendices (after the bibliography). However, reviewers are not required to read the appendix. We will select the top 5 outstanding challenge proposals as oral presentations and provide the winning prizes. Outstanding papers will be published publicly using OpenReview.
The reviewing process will be double-blind. Please submit anonymized versions of your paper that include no identifying information about any author identities or affiliations. Submitted papers must be new work that has not yet been published in a peer-reviewed conference or journal. During submission, we will provide a checkbox to note whether at least one author can join the onsite workshop. These submissions will receive expedited reviews to provide enough time for visa applications.
Submission format: ICML Style Files(In review stage, please use anonymous version, and for camera ready proposals, please use the final version and note the authors.)
Submission link: OpenReview
Policies
Participants:
You are eligible to submit as an individual, on behalf of an organization, from a for-profit or a not-for-profit - we are impartial as to your affiliation (or lack thereof).
Deadlines:
Proposal submission deadlines are strict. In no circumstances will extensions be given.
Double-Blind Review (For Agent Trustworthy Track):
All submissions must be anonymized and may not contain any information with the intention or consequence of violating the double-blind reviewing policy, including (but not limited to) citing previous works of the authors or sharing links in a way that can infer any author’s identity or institution, actions that reveal the identities of the authors to potential reviewers.
Authors are allowed to post versions of their work on preprint servers such as arXiv. They are also allowed to give talks to restricted audiences on the work(s) submitted to our challenge during the review. If you have posted or plan to post a non-anonymized version of your paper online before the ICML decisions are made, the submitted version must not refer to the non-anonymized version.
Dual Submission :
It is not appropriate to submit research proposals that are identical (or substantially similar) to versions that have been previously published, accepted for publication, or submitted in parallel to other conferences or journals. Such submissions violate our dual submission policy, and the organizers have the right to reject such submissions, or to remove them from the proceedings. Note that submissions that have been or are being presented at workshops do not violate the dual-submission policy, as long as there’s no associated archival publication.
Reviewing Criteria :
Each proposal will be evaluated by the judges according to the criteria outlined below. Prizes will be awarded to the proposals which score the best according to the aggregate evaluations of the judges. Accepted research proposals must be based on original research and must contain novel results of significant interest to the machine learning community. Results can be either theoretical or empirical. Results will be judged on the degree to which they have been objectively established and/or their potential for scientific and technological impact. Reproducibility of results and easy availability of code will be taken into account in the decision-making process whenever appropriate.
Ethics:
See the general policy of this challenge for details.
Reference:
- [1] CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society
- [2] MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
- [3] AutoGPT: AutoGPT
- [4] RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
- [5] Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast. arXiv preprint arXiv:2402.08567.
- [6] PaLM-E: An Embodied Multimodal Language Model
- [7] Octopus: Embodied Vision-Language Programmer from Environmental Feedback
- [8] SMARTPLAY : A BENCHMARK FOR LLMS AS INTELLI- GENT AGENTS
Frequently Asked Questions
Will the submission be public?
Yes. After acceptance, your submission will be public as a workshop paper.
What information should I include in the paper?
See the topic and example format section.
Can I participate in a team?
Of course. We encourage researchers to collaborate in this work. For winning teams, prizes will be divided evenly among the lead authors unless requested otherwise.
May I submit a paper that has been published in conferences or journals?
No.
Other questions?
Please contact tifaattack9@gmail.com to ask your questions, we will respond in time. Non-overlapped questions and corresponding responses will be updated into FAQs.