ICML 2024 Workshop

Trustworthy Multi-modal Foundation Models and AI Agents (TiFA)

ICML 2024 @ Vienna, Austria, Jul 27 Sat
Straus 1

Schedule

Please select your time zone:

Time	Session	Description	Duration(mins)
Jul 27, 07:30	Opening Remark	Opening Remark Jing Shao (Shanghai AI Lab)	10
Jul 27, 07:40	Keynote Talk	A data-centric view on reliable generalization Ludwig Schmit (University of Washington; Anthropic)	30
Jul 27, 08:10	Keynote Talk	Robust Alignment and Control with Representation Engineering Matt Fredrikson (Carnegie Mellon University; Gray Swan AI)	30
Jul 27, 08:40	Coffee Break	-	10
Jul 27, 08:50	Panel Discussion	Theme: Security and Safety of AI Agents Panelists: - Alan Chan (Center for the Governance of AI; Mila - Quebec Al Institute) - Tomek Korbak (UK AI Safety Institute) - Ivan Evtimov (Meta AI) - Kai Greshake (NVIDIA) - Matt Fredrikson (Carnegie Mellon University; Gray Swan AI) Moderator: Daniel Paleka (ETH Zurich)	50
Jul 27, 09:40	Contributed Talk	The Safety in Large Language Models Yisen Wang (Peking University)	20
Jul 27, 10:00	Outstanding Paper Talk	Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?	10
Jul 27, 10:10	Outstanding Paper Talk	Towards Adversarially Robust Vision-Language Models: Insights from Design Choices and Prompt Formatting Techniques	10
Jul 27, 10:20	Lunch Break	-	70
Jul 27, 11:30	Keynote Talk	Agent Governance Alan Chan (Center for the Governance of AI; Mila - Quebec Al Institute)	30
Jul 27, 12:00	Keynote Talk	UK AI Safety Institute: Overview & Agents Evals Herbie Bradley (UK AI Safety Institute)	30
Jul 27, 12:30	Contributed Talk	Summary and Prospect of TiFA Challenge Lijun Li (Shanghai AI Lab) & Bowen Dong (Shanghai AI Lab)	20
Jul 27, 12:50	Break	-	20
Jul 27, 13:10	Paper Lightning Talks	Games for AI-Control: Models of Safety-Evaluations of AI Deployment Protocols (Outstanding paper; remote) Decomposed evaluations of geographic disparities in text-to-image models (Outstanding paper; remote) --- WebCanvas: Benchmarking Web Agents in Online Environments (Dehan Kong) Can Editing LLMs Inject Harm? (Shiyang Lai) MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants (John Heibel) Models That Prove Their Own Correctness (Orr Paradise) Bias Begets Bias: the Impact of Biased Embeddings on Diffusion Models (Marvin Li & Jeffrey Wang)	40
Jul 27, 13:50	Poster Session + Social	-	60
Jul 27, 14:50	End of Program	-	0

Description/Call for Papers

Advanced Multi-modal Foundation Models (MFMs) and AI Agents, equipped with diverse modalities [1, 2, 3, 4, 15] and an increasing number of available affordances [5, 6] (e.g., tool use, code interpreter, API access, etc.), have the potential to accelerate and amplify their predecessors’ impact on society [7].

MFM includes multi-modal large language models (MLLMs) and multi-modal generative models (MMGMs). MLLMs refer to LLM-based models with the ability to receive, reason, and output with information of multiple modalities, including but not limited to text, images, audio, and video. Examples include Llava [1], Reka [8], QwenVL [9], LAMM [36],and so on. MMGMs refer to a class of MFM models that can generate new content across multiple modalities, such as generating images from text descriptions or creating videos from audio and text inputs. Examples include Stable Diffusion [2], Sora [10], and Latte [11]. AI agents, or systems with higher degree of agenticness, refer to systems that could achieve complex goals in complex environments with limited direct supervision [12]. Understanding and preempting the vulnerabilities of these systems [13, 35] and their induced harms [14] becomes unprecedentedly crucial.

Building trustworthy MFMs and AI Agents transcends adversarial robustness of such models, but also emphasizes the importance of proactive risk assessment, mitigation, safeguards, and the establishment of comprehensive safety mechanisms throughout the lifecycle of the systems’ development and deployment [16, 17]. This approach demands a blend of technical and socio-technical strategies, incorporating AI governance and regulatory insights to build trustworthy MFMs and AI Agents.

Topics include but are not limited to:

Adversarial attack and defense, poisoning, hijacking and security [18, 13, 19, 20, 21]
Robustness to spurious correlations and uncertainty estimation
Technical approaches to privacy, fairness, accountability and regulation [12, 22, 28]
Truthfulness, factuality, honesty and sycophancy [23, 24]
Transparency, interpretability and monitoring [25, 26]
Identifiers of AI-generated material, such as watermarking [27]
Technical alignment / control , such as scalable overslight [29], representation control [26] and machine unlearning [30]
Model auditing, red-teaming and safety evaluation benchmarks [31, 32, 33, 16]
Measures against malicious model fine-tuning [34]
Novel safety challenges with the introduction of new modalities

Submission Guide

Submission Instructions

Submission site: Submissions should be made on OpenReview.
Submission are non-archival: we receive submissions that are also undergoing peer review elsewhere at the time of submission, but we will not accept submissions that have already been previously published or accepted for publication at peer-reviewed conferences or journals. Submission is permitted for papers presented or to be presented at other non-archival venues (e.g. other workshops). No formal workshop proceedings will be published.
Social Impact Statement: authors are required to include a "Social Impact Statement" that highlights "potential broader impact of their work, including its ethical aspects and future societal consequences".
Submission Length and Format: Submissions should be anonymised papers up to 5 pages (appendices can be added to the main PDF); excluding references and Social Impacts Statement. You must format your submission using the ICML_2024_LaTeX_style_file.
Paper Review: All reviews are double-blinded, with at least two reviewers assigned to each paper.
Camera Ready Instructions: The camera ready version is composed of a main body, which can be up to 6 pages long, followed by unlimited pages for a Social Impact Statement, references and an appendix, all in a single file. The camera-ready versions of all accepted submissions should be uploaded by the authors to the OpenReview page for corresponding submissions. The camera ready version will be publicly available to everyone on the Camera-Ready Deadline displayed below.

Key Dates

Submissions Open	May 11, 2024
Submission Deadline	May 30, 2024
Acceptance Notification	June 17, 2024June 19, 2024
Camera-Ready Deadline	July 7, 2024
Workshop Date	July 27, 2024

All deadlines are specified in 23:59AoE (Anywhere on Earth).

Organizing Committee

Speakers & Panelists

Ludwig Schmidt

University of Washington

Matt Fredrikson

Carnegie Mellon University

Alan Chan

Center for the Governance of AI; Mila - Quebec Al Institute

Herbie Bradley

UK AI Safety Institute

UK AI Safety Institute

Challenge Organizer

Steering Committee

Program Committee

Tianhao Shen

Nora Belrose

Chunhui Zhang

Rudolf Laine

Apoorva Nitsure

Ollie Liu

Herbie Bradley

Kuofeng Gao

Nevan Wichers

Fengqing Jiang

Serah Sessi Akojenu

Sandy Tanwisuth

Canyu Chen

Peiyan Zhang

Zijing Shi

Guozheng Ma

Zhongyu Ouyang

Xi Li

Jane Pan

Alexis Roger

Anisa Halimi

Qi She

Cindy Wu

Toyib Ogunremi

Max Kaufmann

Brooklyn Sheppard

Yuchen Zhang

Boyuan Chen

Xinyu Yang

Benjamin Bucknall

Oliver Jaffe

Samuel E Kwok

Yawen Zhang

Javier Rando

Kevin Wei

Shiqi Chen

Aengus Lynch

Edem Wornyo

Muhammad Ali

Zonghao Ying

Xiao Li

Tom Sühr

Messi H.J. Lee

Dongping Chen

Chulin Xie

Robert Wu

Alex Goldie

Jerry Huang

Dongyoung Go

Xiaoyuan Guo

Zekun Wu

Jiayi Kelsey Wang

Zeming Wei

Kunlin Cai

Shaina Raza

Jiaying Lu

Melissa Hall

Zhiqiu Jiang

Abigail Goldsteen

Pengwei Li

Tanya Akumu

Diji Yang

Roy Siegelmann

Xianjun Yang

Ole Kristian Jorgensen

Adit Magotra

Khaoula Chehbouni

Aidan O'Gara

Hao Yu

Yibo Wang

Jiaxin Zhang

Jinmeng Rao

Jianfeng Chi

Kyle A. Kilian

Haoyu Wang

Jinghuai Zhang

Aileen Nielsen

David Reber

Wendi Cui

Euan Ong

Lennart Heim

Carson Ezell

Hugo Laurence Fry

Peng Cui

Expand

Frequently Asked Questions

Can we submit a paper that will also be submitted to NeurIPS 2024?

Yes.

Can we submit a paper that was accepted at ICLR 2024?

No. ICML prohibits main conference publication from appearing concurrently at the workshops.

Will the reviews be made available to authors?

Yes.

I have a question not addressed here, whom should I contact?

Email organizers at icmltifaworkshop@gmail.com

References

[1] Liu, H., Li, C., Wu, Q., & Lee, Y. J. (2024). Visual instruction tuning. Advances in neural information processing systems, 36.

[2] Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.

[3] OpenAI. (2023). GPT-4 with vision (GPT-4v) system card.

[4] Z. Yang, L. Li, K. Lin, J. Wang, C.-C. Lin, Z. Liu, and L. Wang. The dawn of lmms: Preliminary explorations with gpt-4v(ision), 2023.

[5] Qin, Y., Liang, S., Ye, Y., Zhu, K., Yan, L., Lu, Y., ... & Sun, M. (2023). ToolIIM: Facilitating large language models to master 16000+ real-world APIs.

[6] C. Zhang, Z. Yang, J. Liu, Y. Han, X. Chen, Z. Huang, B. Fu, and G. Yu. Appagent: Multimodal agents as smartphone users, 2023.

[7] T. Eloundou, S. Manning, P. Mishkin, and D. Rock. Gpts are gpts: An early look at the labor market impact potential of large language models, 2023.

[8] Ormazabal, Aitor, et al. "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models." arXiv preprint arXiv:2404.12387 (2024).

[9] Bai, J., Bai, S., Yang, S., Wang, S., Tan, S., Wang, P., ... & Zhou, J. (2023). Qwen-vl: A frontier large vision-language model with versatile abilities. arXiv preprint arXiv:2308.12966.

[10] Sora: Creating video from text. (n.d.). https://openai.com/sora

[11] Ma, X., Wang, Y., Jia, G., Chen, X., Liu, Z., Li, Y. F., ... & Qiao, Y. (2024). Latte: Latent diffusion transformer for video generation. arXiv preprint arXiv:2401.03048.

[12] Shavit, Y., Agarwal, S., Brundage, M., Adler, S., O’Keefe, C., Campbell, R., ... & Robinson, D. G. (2023). Practices for Governing Agentic AI Systems. Research Paper, OpenAI, December.

[13] N. Carlini, M. Nasr, C. A. Choquette-Choo, M. Jagielski, I. Gao, A. Awadalla, P. W. Koh, D. Ippolito, K. Lee, F. Tramer, and L. Schmidt. Are aligned neural networks adversarially aligned?, 2023.

[14] A. Chan, R. Salganik, A. Markelius, C. Pang, N. Rajkumar, D. Krasheninnikov, L. Langosco, Z. He, Y. Duan, M. Carroll, M. Lin, A. Mayhew, K. Collins, M. Molamohammadi, J. Burden, W. Zhao, S. Rismani, K. Voudouris, U. Bhatt, A. Weller, D. Krueger, and T. Maharaj. Harms from increasingly agentic algorithmic systems. In 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’23. ACM, June 2023. doi: 10.1145/3593013.3594033. URL http://dx.doi.org/10.1145/3593013.3594033.

[15] Gemini Team. Gemini: A family of highly capable multimodal models, 2023.

[16] T. Shevlane, S. Farquhar, B. Garfinkel, M. Phuong, J. Whittlestone, J. Leung, D. Kokotajlo, N. Marchal, M. Anderljung, N. Kolt, L. Ho, D. Siddarth, S. Avin, W. Hawkins, B. Kim, I. Gabriel, V. Bolina, J. Clark, Y. Bengio, P. Christiano, and A. Dafoe. Model evaluation for extreme risks, 2023.

[17] L. Weidinger, M. Rauh, N. Marchal, A. Manzini, L. A. Hendricks, J. Mateos-Garcia, S. Bergman, J. Kay, C. Griffin, B. Bariach, I. Gabriel, V. Rieser, and W. Isaac. Sociotechnical safety evaluation of generative ai systems, 2023.

[18] L. Bailey, E. Ong, S. Russell, and S. Emmons. Image hijacks: Adversarial images can control generative models at runtime, 2023.

[19] Jain, N., Schwarzschild, A., Wen, Y., Somepalli, G., Kirchenbauer, J., yeh Chiang, P., ... & Goldstein, T. (2023). Baseline defenses for adversarial attacks against aligned language models.

[20] Robey, A., Wong, E., Hassani, H., & Pappas, G. J. (2023). SmoothLLM: Defending large language models against jailbreaking attacks.

[21] B. Wang, W. Chen, H. Pei, C. Xie, M. Kang, C. Zhang, C. Xu, Z. Xiong, R. Dutta, R. Schaeffer, et al. Decodingtrust: A comprehensive assessment of trustworthiness in gpt models. 2023.

[22] Chan, A., Ezell, C., Kaufmann, M., Wei, K., Hammond, L., Bradley, H., ... & Anderljung, M. (2024). Visibility into AI Agents. arXiv preprint arXiv:2401.13138.

[23] Huang, Q., Dong, X., Zhang, P., Wang, B., He, C., Wang, J., Lin, D., Zhang, W., & Yu, N. (2023). Opera: Alleviating hallucination in multi-modal large language models via over-trust penalty and retrospection-allocation.

[24] Sharma, M., Tong, M., Korbak, T., Duvenaud, D., Askell, A., Bowman, S. R., ... & Perez, E. (2023). Towards understanding sycophancy in language models.

[25] Meng, K., Bau, D., Andonian, A., & Belinkov, Y. (2022). Locating and editing factual associations in GPT.

[26] A. Zou, L. Phan, S. Chen, J. Campbell, P. Guo, R. Ren, A. Pan, X. Yin, M. Mazeika, A.-K. Dombrowski, S. Goel, N. Li, M. J. Byun, Z. Wang, A. Mallen, S. Basart, S. Koyejo, D. Song, M. Fredrikson, J. Z. Kolter, and D. Hendrycks. Representation engineering: A top-down approach to ai transparency, 2023.

[27] Kirchenbauer, J., Geiping, J., Wen, Y., Katz, J., Miers, I., & Goldstein, T. (2023). A watermark for large language models. In Proceedings of the 40th International Conference on Machine Learning.

[28] Nasr, M., Carlini, N., Hayase, J., Jagielski, M., Cooper, A. F., Ippolito, D., ... & Lee, K. (2023). Scalable extraction of training data from (production) language models.

[29] S. R. Bowman, J. Hyun, E. Perez, E. Chen, C. Pettit, S. Heiner, K. Lukoˇsi ̄ut ̇e, A. Askell, A. Jones, A. Chen, A. Goldie, A. Mirhoseini, C. McKinnon, C. Olah, D. Amodei, D. Amodei, D. Drain, D. Li, E. Tran-Johnson, J. Kernion, J. Kerr, J. Mueller, J. Ladish, J. Landau, K. Ndousse, L. Lovitt, N. Elhage, N. Schiefer, N. Joseph, N. Mercado, N. DasSarma, R. Larson, S. McCandlish, S. Kundu, S. Johnston, S. Kravec, S. E. Showk, S. Fort, T. Telleen-Lawton, T. Brown, T. Henighan, T. Hume, Y. Bai, Z. Hatfield-Dodds, B. Mann, and J. Kaplan. Measuring progress on scalable oversight for large language models, 2022.

[30] Yao, Y., Xu, X., & Liu, Y. (2023). Large language model unlearning. arXiv preprint arXiv:2310.10683.

[31] S. Casper, C. Ezell, C. Siegmann, N. Kolt, T. L. Curtis, B. Bucknall, A. Haupt, K. Wei, J. Scheurer, M. Hobbhahn, L. Sharkey, S. Krishna, M. V. Hagen, S. Alberti, A. Chan, Q. Sun, M. Gerovitch, D. Bau, M. Tegmark, D. Krueger, and D. Hadfield-Menell. Black-box access is insufficient for rigorous ai audits, 2024.

[32] M. Bhatt, S. Chennabasappa, C. Nikolaidis, S. Wan, I. Evtimov, D. Gabi, D. Song, F. Ahmad, C. Aschermann, L. Fontana, S. Frolov, R. P. Giri, D. Kapil, Y. Kozyrakis, D. LeBlanc, J. Milazzo, A. Straumann, G. Synnaeve, V. Vontimitta, S. Whitman, and J. Saxe. Purple llama cyberseceval: A secure coding benchmark for language models, 2023.

[33] D. Ganguli, L. Lovitt, J. Kernion, A. Askell, Y. Bai, S. Kadavath, B. Mann, E. Perez, N. Schiefer, K. Ndousse, A. Jones, S. Bowman, A. Chen, T. Conerly, N. DasSarma, D. Drain, N. Elhage, S. El-Showk, S. Fort, Z. Hatfield-Dodds, T. Henighan, D. Hernandez, T. Hume, J. Jacobson, S. Johnston, S. Kravec, C. Olsson, S. Ringer, E. Tran-Johnson, D. Amodei, T. Brown, N. Joseph, S. McCandlish, C. Olah, J. Kaplan, and J. Clark. Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned, 2022.

[34] Henderson, P., Mitchell, E., Manning, C., Jurafsky, D., & Finn, C. (2023). Self-destructing models: Increasing the costs of harmful dual uses of foundation models. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’23.

[35] Y. Dong, H. Chen, J. Chen, Z. Fang, X. Yang, Y. Zhang, Y. Tian, H. Su, and J. Zhu. How robust is google’s bard to adversarial image attacks?, 2023.

[36] Yin, Z., Wang, J., Cao, J., Shi, Z., Liu, D., Li, M., Sheng, L., Bai, L., Huang, X., Wang, Z., & others (2023). LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark. arXiv preprint arXiv:2306.06687.

Expand