GreenFOMO@ECCV2024

Introduction

Foundation models generalize well to various downstream tasks, thanks to their web-scale pre-training, and have become a de-facto tool in pushing the frontiers of computer vision research. Despite the exciting progress, developing foundation models requires large compute resources, incurring heavy environmental costs. For instance, CLIP reports the use of hundreds of GPUs and LLaMA 2 family led to an emission of 539 tCO2eq, requiring about 27,000 trees in one year to capture the emissions.

To reduce computation and improve data efficiency, the computer vision community has explored efficient ways of adapting large foundation models for downstream tasks. For example, recent methods tune only a small set of learnable tokens (or prompts) while keeping the weights of the pre-trained model frozen. Compression methods also play a crucial role in optimizing the efficiency of foundation models by pruning or distillation. Very recent methods have pioneered in a new training-free paradigm by leveraging existing foundation models or an assembly of them, to address computer vision tasks such as image classification, 3D scene understanding, visual reasoning without fine-tuning. Such research advancement is exciting and encouraging as it enables faster adaptation to downstream tasks without excessive computation, while mitigating the carbon footprint associated with resource-intensive processes.

The Green FOundation MOdels (GreenFOMO) workshop aims to accelerate momentum around these emerging research topics, foster an inclusive research and innovation ecosystem involving small/medium sized practitioners in both academia and industry, and collectively making a green impact to society. GreenFOMO promotes novel methodologies for efficient exploitation of foundation models and encourage applications of FOMOs in domains that induce green impacts, such as biodiversity, agricultural, food security, among others.

Potential topics include, but are not limited to:

Training-free methods

Parameter efficient fine-tuning

Model compression and pruning

Token reduction

Light-weight models

Personalization of vision & language models

Uni-(multi-)modal reasoning

Continual/transfer learning

Efficient generative models

Novel applications for sustainability and biodiversity

Tweets by Green_FOMO

Invited Speakers

Dr. Ranjay Krishna is an Assistant Professor at the Paul G. Allen School of Computer Science & Engineering. His research lies at the intersection of computer vision and human computer interaction. This research has received best paper, outstanding paper, and orals at CVPR, ACL, CSCW, NeurIPS, UIST, and ECCV, and has been reported by Science, Forbes, the Wall Street Journal, and PBS NOVA. His research has been supported by Google, Amazon, Cisco, Toyota Research Institute, NSF, ONR, and Yahoo. He holds a bachelor's degree in Electrical & Computer Engineering and in Computer Science from Cornell University, a master's degree in Computer Science from Stanford University and a Ph.D. in Computer Science from Stanford University. His recent works cover instruction tuning for addressing complex visual tasks with low computational budget.

Dr. Elisabetta Farella is a researcher and head of the Energy Efficient Embedded Digital Architecture (E3DA) Research Unit at the Fondazione Bruno Kessler (FBK) in Trento, Italy. Her research focuses on smart sensing, embedded systems, and tinyML, emphasising energy-efficient and scalable machine-learning solutions for resource-constrained devices. The resource-aware, scalable AI solutions developed in her unit contribute to innovation across various national and international projects, engaging both industrial and academic sectors in advancing smart and energy-efficient technologies. Her publication record also reflects her contributions to IoT, wearable computing, and HCI domains, driving sustainable and innovative technological applications. Elisabetta recent work on energy-efficient embedded systems and TinyML is of great interest to the community.

Dr. Jose M. Alvarez is a research director at NVIDIA, leading the Autonomous Vehicle Applied Research team. His team maximizes the impact of the latest research advances on the AV product. Jose research interests include model-centric and data-centric deep learning toward more efficient and scalable systems. Jose completed his Ph.D. in computer science in Barcelona, specializing in road-scene understanding for autonomous driving when datasets were very limited. He also worked as a postdoctoral researcher at NYU under Yann LeCunn. Jose recent research on pruning and efficient deep learning will contribute to the workshop discussion.

Dr. Eric Schulz is an incoming professor at LMU and the director of the Institute of Human Centered AI at Helmholtz Munich. He finished PhD at UCL in 2017 working on generalization and exploration in reinforcement learning. From 2017 to 2019, he was a Data Science Postdoctoral Fellow at Harvard University, where he worked on computational models of learning and decision making and from 2020 to 2023 he was a Max Planck Independent Group Leader at the MPI for Biological Cybernetics. His recent studies on LLMs will bring valuable insights to our workshop from a cognitive perspective.

Dr. Judy Hoffman is assistant Professor in the School of Interactive Computing at Georgia Tech and a member of the Machine Learning Center. Research interests include computer vision, machine learning, domain adaptation, robustness, and fairness. Prior to joining Georgia Tech, Dr. Hoffman was a Visiting Research Scientist at Facebook AI Research and a postdoctoral scholar at Stanford University and UC Berkeley. She received her PhD from UC Berkeley, EECS in 2016 where she was a member of BAIR and BDD. Her recent works focus on FOMO efficiency, e.g. Token Merging (ToMe) and Binary Vision Transformers (BiViT) to reduce the computational burden.