ICML 2024 Workshop

Trustworthy Multi-modal Foundation Models and AI Agents (TiFA)

ICML 2024 @ Vienna, Austria, Jul 27 Sat
Straus 1


Schedule

TimeSessionDescriptionDuration(mins)
Jul 27, 07:30
Opening Remark
    Opening Remark
    post.title
    Jing Shao (Shanghai AI Lab)
    10
    Jul 27, 07:40
    Keynote Talk
      A data-centric view on reliable generalization
      post.title
      Ludwig Schmit (University of Washington; Anthropic)
      30
      Jul 27, 08:10
      Keynote Talk
        Robust Alignment and Control with Representation Engineering
        post.title
        Matt Fredrikson (Carnegie Mellon University; Gray Swan AI)
        30
        Jul 27, 08:40
        Coffee Break
          -
          10
          Jul 27, 08:50
          Panel Discussion
            Theme: Security and Safety of AI Agents
            Panelists:
            - Alan Chan (Center for the Governance of AI; Mila - Quebec Al Institute)
            - Tomek Korbak (UK AI Safety Institute)
            - Ivan Evtimov (Meta AI)
            - Kai Greshake (NVIDIA)
            - Matt Fredrikson (Carnegie Mellon University; Gray Swan AI)

            Moderator: Daniel Paleka (ETH Zurich)
            50
            Jul 27, 09:40
            Contributed Talk
              The Safety in Large Language Models
              post.title
              Yisen Wang (Peking University)
              20
              Jul 27, 10:00
              Outstanding Paper Talk
                Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?
                10
                Jul 27, 10:10
                Outstanding Paper Talk
                  Towards Adversarially Robust Vision-Language Models: Insights from Design Choices and Prompt Formatting Techniques
                  10
                  Jul 27, 10:20
                  Lunch Break
                    -
                    70
                    Jul 27, 11:30
                    Keynote Talk
                      Agent Governance
                      post.title
                      Alan Chan (Center for the Governance of AI; Mila - Quebec Al Institute)
                      30
                      Jul 27, 12:00
                      Keynote Talk
                        UK AI Safety Institute: Overview & Agents Evals
                        post.title
                        Herbie Bradley (UK AI Safety Institute)
                        30
                        Jul 27, 12:30
                        Contributed Talk
                          Summary and Prospect of TiFA Challenge
                          post.title
                          Lijun Li (Shanghai AI Lab) & Bowen Dong (Shanghai AI Lab)
                          20
                          Jul 27, 12:50
                          Break
                            -
                            20
                            Jul 27, 13:10
                            Paper Lightning Talks
                              Games for AI-Control: Models of Safety-Evaluations of AI Deployment Protocols (Outstanding paper; remote)
                              Decomposed evaluations of geographic disparities in text-to-image models (Outstanding paper; remote)
                              ---
                              WebCanvas: Benchmarking Web Agents in Online Environments (Dehan Kong)
                              Can Editing LLMs Inject Harm? (Shiyang Lai)
                              MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants (John Heibel)
                              Models That Prove Their Own Correctness (Orr Paradise)
                              Bias Begets Bias: the Impact of Biased Embeddings on Diffusion Models (Marvin Li & Jeffrey Wang)
                              40
                              Jul 27, 13:50
                              Poster Session + Social
                                -
                                60
                                Jul 27, 14:50
                                End of Program
                                  -
                                  0

                                  Description/Call for Papers

                                  Advanced Multi-modal Foundation Models (MFMs) and AI Agents, equipped with diverse modalities [1, 2, 3, 4, 15] and an increasing number of available affordances [5, 6] (e.g., tool use, code interpreter, API access, etc.), have the potential to accelerate and amplify their predecessors’ impact on society [7].
                                  MFM includes multi-modal large language models (MLLMs) and multi-modal generative models (MMGMs). MLLMs refer to LLM-based models with the ability to receive, reason, and output with information of multiple modalities, including but not limited to text, images, audio, and video. Examples include Llava [1], Reka [8], QwenVL [9], LAMM [36],and so on. MMGMs refer to a class of MFM models that can generate new content across multiple modalities, such as generating images from text descriptions or creating videos from audio and text inputs. Examples include Stable Diffusion [2], Sora [10], and Latte [11]. AI agents, or systems with higher degree of agenticness, refer to systems that could achieve complex goals in complex environments with limited direct supervision [12]. Understanding and preempting the vulnerabilities of these systems [13, 35] and their induced harms [14] becomes unprecedentedly crucial.
                                  Building trustworthy MFMs and AI Agents transcends adversarial robustness of such models, but also emphasizes the importance of proactive risk assessment, mitigation, safeguards, and the establishment of comprehensive safety mechanisms throughout the lifecycle of the systems’ development and deployment [16, 17]. This approach demands a blend of technical and socio-technical strategies, incorporating AI governance and regulatory insights to build trustworthy MFMs and AI Agents.
                                  Topics include but are not limited to:
                                  • Adversarial attack and defense, poisoning, hijacking and security [18, 13, 19, 20, 21]
                                  • Robustness to spurious correlations and uncertainty estimation
                                  • Technical approaches to privacy, fairness, accountability and regulation [12, 22, 28]
                                  • Truthfulness, factuality, honesty and sycophancy [23, 24]
                                  • Transparency, interpretability and monitoring [25, 26]
                                  • Identifiers of AI-generated material, such as watermarking [27]
                                  • Technical alignment / control , such as scalable overslight [29], representation control [26] and machine unlearning [30]
                                  • Model auditing, red-teaming and safety evaluation benchmarks [31, 32, 33, 16]
                                  • Measures against malicious model fine-tuning [34]
                                  • Novel safety challenges with the introduction of new modalities

                                  Submission Guide

                                  Submission Instructions

                                  • Submission site: Submissions should be made on OpenReview.
                                  • Submission are non-archival: we receive submissions that are also undergoing peer review elsewhere at the time of submission, but we will not accept submissions that have already been previously published or accepted for publication at peer-reviewed conferences or journals. Submission is permitted for papers presented or to be presented at other non-archival venues (e.g. other workshops). No formal workshop proceedings will be published.
                                  • Social Impact Statement: authors are required to include a "Social Impact Statement" that highlights "potential broader impact of their work, including its ethical aspects and future societal consequences".
                                  • Submission Length and Format: Submissions should be anonymised papers up to 5 pages (appendices can be added to the main PDF); excluding references and Social Impacts Statement. You must format your submission using the ICML_2024_LaTeX_style_file.
                                  • Paper Review: All reviews are double-blinded, with at least two reviewers assigned to each paper.
                                  • Camera Ready Instructions: The camera ready version is composed of a main body, which can be up to 6 pages long, followed by unlimited pages for a Social Impact Statement, references and an appendix, all in a single file. The camera-ready versions of all accepted submissions should be uploaded by the authors to the OpenReview page for corresponding submissions. The camera ready version will be publicly available to everyone on the Camera-Ready Deadline displayed below.

                                  Key Dates

                                  Submissions OpenMay 11, 2024
                                  Submission DeadlineMay 30, 2024
                                  Acceptance NotificationJune 17, 2024June 19, 2024
                                  Camera-Ready DeadlineJuly 7, 2024
                                  Workshop DateJuly 27, 2024
                                  All deadlines are specified in 23:59AoE (Anywhere on Earth).

                                  Program Committee

                                  Tianhao Shen
                                  Nora Belrose
                                  Chunhui Zhang
                                  Rudolf Laine
                                  Apoorva Nitsure
                                  Ollie Liu
                                  Herbie Bradley
                                  Kuofeng Gao
                                  Nevan Wichers
                                  Fengqing Jiang
                                  Serah Sessi Akojenu
                                  Sandy Tanwisuth
                                  Canyu Chen
                                  Peiyan Zhang
                                  Zijing Shi
                                  Guozheng Ma
                                  Zhongyu Ouyang
                                  Xi Li
                                  Jane Pan
                                  Alexis Roger
                                  Anisa Halimi
                                  Qi She
                                  Cindy Wu
                                  Toyib Ogunremi
                                  Max Kaufmann
                                  Brooklyn Sheppard
                                  Yuchen Zhang
                                  Boyuan Chen
                                  Xinyu Yang
                                  Benjamin Bucknall
                                  Oliver Jaffe
                                  Samuel E Kwok
                                  Yawen Zhang
                                  Javier Rando
                                  Kevin Wei
                                  Shiqi Chen
                                  Aengus Lynch
                                  Edem Wornyo
                                  Muhammad Ali
                                  Zonghao Ying
                                  Xiao Li
                                  Tom Sühr
                                  Messi H.J. Lee
                                  Dongping Chen
                                  Chulin Xie
                                  Robert Wu
                                  Alex Goldie
                                  Jerry Huang
                                  Dongyoung Go
                                  Xiaoyuan Guo
                                  Zekun Wu
                                  Jiayi Kelsey Wang
                                  Zeming Wei
                                  Kunlin Cai
                                  Shaina Raza
                                  Jiaying Lu
                                  Melissa Hall
                                  Zhiqiu Jiang
                                  Abigail Goldsteen
                                  Pengwei Li
                                  Tanya Akumu
                                  Diji Yang
                                  Roy Siegelmann
                                  Xianjun Yang
                                  Ole Kristian Jorgensen
                                  Adit Magotra
                                  Khaoula Chehbouni
                                  Aidan O'Gara
                                  Hao Yu
                                  Yibo Wang
                                  Jiaxin Zhang
                                  Jinmeng Rao
                                  Jianfeng Chi
                                  Kyle A. Kilian
                                  Haoyu Wang
                                  Jinghuai Zhang
                                  Aileen Nielsen
                                  David Reber
                                  Wendi Cui
                                  Euan Ong
                                  Lennart Heim
                                  Carson Ezell
                                  Hugo Laurence Fry
                                  Peng Cui

                                  Frequently Asked Questions

                                  Can we submit a paper that will also be submitted to NeurIPS 2024?

                                  Yes.

                                  Can we submit a paper that was accepted at ICLR 2024?

                                  No. ICML prohibits main conference publication from appearing concurrently at the workshops.

                                  Will the reviews be made available to authors?

                                  Yes.

                                  I have a question not addressed here, whom should I contact?

                                  Email organizers at icmltifaworkshop@gmail.com

                                  References

                                  [1] Liu, H., Li, C., Wu, Q., & Lee, Y. J. (2024). Visual instruction tuning. Advances in neural information processing systems, 36.

                                  [2] Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.

                                  [3] OpenAI. (2023). GPT-4 with vision (GPT-4v) system card.

                                  [4] Z. Yang, L. Li, K. Lin, J. Wang, C.-C. Lin, Z. Liu, and L. Wang. The dawn of lmms: Preliminary explorations with gpt-4v(ision), 2023.

                                  [5] Qin, Y., Liang, S., Ye, Y., Zhu, K., Yan, L., Lu, Y., ... & Sun, M. (2023). ToolIIM: Facilitating large language models to master 16000+ real-world APIs.

                                  [6] C. Zhang, Z. Yang, J. Liu, Y. Han, X. Chen, Z. Huang, B. Fu, and G. Yu. Appagent: Multimodal agents as smartphone users, 2023.

                                  [7] T. Eloundou, S. Manning, P. Mishkin, and D. Rock. Gpts are gpts: An early look at the labor market impact potential of large language models, 2023.

                                  [8] Ormazabal, Aitor, et al. "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models." arXiv preprint arXiv:2404.12387 (2024).

                                  [9] Bai, J., Bai, S., Yang, S., Wang, S., Tan, S., Wang, P., ... & Zhou, J. (2023). Qwen-vl: A frontier large vision-language model with versatile abilities. arXiv preprint arXiv:2308.12966.

                                  [10] Sora: Creating video from text. (n.d.). https://openai.com/sora

                                  [11] Ma, X., Wang, Y., Jia, G., Chen, X., Liu, Z., Li, Y. F., ... & Qiao, Y. (2024). Latte: Latent diffusion transformer for video generation. arXiv preprint arXiv:2401.03048.

                                  [12] Shavit, Y., Agarwal, S., Brundage, M., Adler, S., O’Keefe, C., Campbell, R., ... & Robinson, D. G. (2023). Practices for Governing Agentic AI Systems. Research Paper, OpenAI, December.

                                  [13] N. Carlini, M. Nasr, C. A. Choquette-Choo, M. Jagielski, I. Gao, A. Awadalla, P. W. Koh, D. Ippolito, K. Lee, F. Tramer, and L. Schmidt. Are aligned neural networks adversarially aligned?, 2023.

                                  [14] A. Chan, R. Salganik, A. Markelius, C. Pang, N. Rajkumar, D. Krasheninnikov, L. Langosco, Z. He, Y. Duan, M. Carroll, M. Lin, A. Mayhew, K. Collins, M. Molamohammadi, J. Burden, W. Zhao, S. Rismani, K. Voudouris, U. Bhatt, A. Weller, D. Krueger, and T. Maharaj. Harms from increasingly agentic algorithmic systems. In 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’23. ACM, June 2023. doi: 10.1145/3593013.3594033. URL http://dx.doi.org/10.1145/3593013.3594033.

                                  [15] Gemini Team. Gemini: A family of highly capable multimodal models, 2023.

                                  [16] T. Shevlane, S. Farquhar, B. Garfinkel, M. Phuong, J. Whittlestone, J. Leung, D. Kokotajlo, N. Marchal, M. Anderljung, N. Kolt, L. Ho, D. Siddarth, S. Avin, W. Hawkins, B. Kim, I. Gabriel, V. Bolina, J. Clark, Y. Bengio, P. Christiano, and A. Dafoe. Model evaluation for extreme risks, 2023.

                                  [17] L. Weidinger, M. Rauh, N. Marchal, A. Manzini, L. A. Hendricks, J. Mateos-Garcia, S. Bergman, J. Kay, C. Griffin, B. Bariach, I. Gabriel, V. Rieser, and W. Isaac. Sociotechnical safety evaluation of generative ai systems, 2023.

                                  [18] L. Bailey, E. Ong, S. Russell, and S. Emmons. Image hijacks: Adversarial images can control generative models at runtime, 2023.

                                  [19] Jain, N., Schwarzschild, A., Wen, Y., Somepalli, G., Kirchenbauer, J., yeh Chiang, P., ... & Goldstein, T. (2023). Baseline defenses for adversarial attacks against aligned language models.

                                  [20] Robey, A., Wong, E., Hassani, H., & Pappas, G. J. (2023). SmoothLLM: Defending large language models against jailbreaking attacks.

                                  [21] B. Wang, W. Chen, H. Pei, C. Xie, M. Kang, C. Zhang, C. Xu, Z. Xiong, R. Dutta, R. Schaeffer, et al. Decodingtrust: A comprehensive assessment of trustworthiness in gpt models. 2023.

                                  [22] Chan, A., Ezell, C., Kaufmann, M., Wei, K., Hammond, L., Bradley, H., ... & Anderljung, M. (2024). Visibility into AI Agents. arXiv preprint arXiv:2401.13138.

                                  [23] Huang, Q., Dong, X., Zhang, P., Wang, B., He, C., Wang, J., Lin, D., Zhang, W., & Yu, N. (2023). Opera: Alleviating hallucination in multi-modal large language models via over-trust penalty and retrospection-allocation.

                                  [24] Sharma, M., Tong, M., Korbak, T., Duvenaud, D., Askell, A., Bowman, S. R., ... & Perez, E. (2023). Towards understanding sycophancy in language models.

                                  [25] Meng, K., Bau, D., Andonian, A., & Belinkov, Y. (2022). Locating and editing factual associations in GPT.

                                  [26] A. Zou, L. Phan, S. Chen, J. Campbell, P. Guo, R. Ren, A. Pan, X. Yin, M. Mazeika, A.-K. Dombrowski, S. Goel, N. Li, M. J. Byun, Z. Wang, A. Mallen, S. Basart, S. Koyejo, D. Song, M. Fredrikson, J. Z. Kolter, and D. Hendrycks. Representation engineering: A top-down approach to ai transparency, 2023.

                                  [27] Kirchenbauer, J., Geiping, J., Wen, Y., Katz, J., Miers, I., & Goldstein, T. (2023). A watermark for large language models. In Proceedings of the 40th International Conference on Machine Learning.

                                  [28] Nasr, M., Carlini, N., Hayase, J., Jagielski, M., Cooper, A. F., Ippolito, D., ... & Lee, K. (2023). Scalable extraction of training data from (production) language models.

                                  [29] S. R. Bowman, J. Hyun, E. Perez, E. Chen, C. Pettit, S. Heiner, K. Lukoˇsi ̄ut ̇e, A. Askell, A. Jones, A. Chen, A. Goldie, A. Mirhoseini, C. McKinnon, C. Olah, D. Amodei, D. Amodei, D. Drain, D. Li, E. Tran-Johnson, J. Kernion, J. Kerr, J. Mueller, J. Ladish, J. Landau, K. Ndousse, L. Lovitt, N. Elhage, N. Schiefer, N. Joseph, N. Mercado, N. DasSarma, R. Larson, S. McCandlish, S. Kundu, S. Johnston, S. Kravec, S. E. Showk, S. Fort, T. Telleen-Lawton, T. Brown, T. Henighan, T. Hume, Y. Bai, Z. Hatfield-Dodds, B. Mann, and J. Kaplan. Measuring progress on scalable oversight for large language models, 2022.

                                  [30] Yao, Y., Xu, X., & Liu, Y. (2023). Large language model unlearning. arXiv preprint arXiv:2310.10683.

                                  [31] S. Casper, C. Ezell, C. Siegmann, N. Kolt, T. L. Curtis, B. Bucknall, A. Haupt, K. Wei, J. Scheurer, M. Hobbhahn, L. Sharkey, S. Krishna, M. V. Hagen, S. Alberti, A. Chan, Q. Sun, M. Gerovitch, D. Bau, M. Tegmark, D. Krueger, and D. Hadfield-Menell. Black-box access is insufficient for rigorous ai audits, 2024.

                                  [32] M. Bhatt, S. Chennabasappa, C. Nikolaidis, S. Wan, I. Evtimov, D. Gabi, D. Song, F. Ahmad, C. Aschermann, L. Fontana, S. Frolov, R. P. Giri, D. Kapil, Y. Kozyrakis, D. LeBlanc, J. Milazzo, A. Straumann, G. Synnaeve, V. Vontimitta, S. Whitman, and J. Saxe. Purple llama cyberseceval: A secure coding benchmark for language models, 2023.

                                  [33] D. Ganguli, L. Lovitt, J. Kernion, A. Askell, Y. Bai, S. Kadavath, B. Mann, E. Perez, N. Schiefer, K. Ndousse, A. Jones, S. Bowman, A. Chen, T. Conerly, N. DasSarma, D. Drain, N. Elhage, S. El-Showk, S. Fort, Z. Hatfield-Dodds, T. Henighan, D. Hernandez, T. Hume, J. Jacobson, S. Johnston, S. Kravec, C. Olsson, S. Ringer, E. Tran-Johnson, D. Amodei, T. Brown, N. Joseph, S. McCandlish, C. Olah, J. Kaplan, and J. Clark. Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned, 2022.

                                  [34] Henderson, P., Mitchell, E., Manning, C., Jurafsky, D., & Finn, C. (2023). Self-destructing models: Increasing the costs of harmful dual uses of foundation models. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’23.

                                  [35] Y. Dong, H. Chen, J. Chen, Z. Fang, X. Yang, Y. Zhang, Y. Tian, H. Su, and J. Zhu. How robust is google’s bard to adversarial image attacks?, 2023.

                                  [36] Yin, Z., Wang, J., Cao, J., Shi, Z., Liu, D., Li, M., Sheng, L., Bai, L., Huang, X., Wang, Z., & others (2023). LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark. arXiv preprint arXiv:2306.06687.