Publikation
Dude: Dual Distribution-Aware Context Prompt Learning For Large Vision-Language Model
Duy Minh Ho Nguyen; An Thai Le; Trung Nguyen; Nghiem Tuong Diep; Tai Nguyen; Duy Duong-Tran; Jan Peters; Li Shen; Mathias Niepert; Daniel Sonntag
In: Vu Nguyen; Hsuan-Tien Lin (Hrsg.). Asian Conference on Machine Learning, 5-8 December 2024, Hanoi, Vietnam. Asian Conference on Machine Learning (ACML), Pages 687-702, Proceedings of Machine Learning Research, Vol. 260, PMLR, 2024.
Zusammenfassung
Prompt learning methods are gaining increasing attention due to their ability to customize
large vision-language models to new domains using pre-trained contextual knowledge and
minimal training data. However, existing works typically rely on optimizing unified prompt
inputs, often struggling with fine-grained classification tasks due to insufficient discrimi-
native attributes. To tackle this, we consider a new framework based on a dual context
of both domain-shared and class-specific contexts, where the latter is generated by Large
Language Models (LLMs) such as GPTs. Such dual prompt methods enhance the model’s
feature representation by joining implicit and explicit factors encoded in LLM knowledge.
Moreover, we formulate the Unbalanced Optimal Transport (UOT) theory to quantify the
relationships between constructed prompts and visual tokens. Through partial matching,
UOT can properly align discrete sets of visual tokens and prompt embeddings under dif-
ferent mass distributions, which is particularly valuable for handling irrelevant or noisy
elements, ensuring that the preservation of mass does not restrict transport solutions. Fur-
thermore, UOT’s characteristics integrate seamlessly with image augmentation, expanding
the training sample pool while maintaining a reasonable distance between perturbed im-
ages and prompt inputs. Extensive experiments across few-shot classification and adapter
settings substantiate the superiority of our model over current state-of-the-art baselines.
