TY - JOUR AU - Qiu, Xipeng AB - Abstract: Most downstream adaptation methods tune all or part of the parameters of pre-trained models (PTMs) through gradient descent, where the tuning cost increases linearly with the growth of the model size. By contrast, gradient-free methods only require the forward computation of the PTM to tune the prompt, retaining the benefits of efficient tuning and deployment. Though, past work on gradient-free tuning often introduces gradient descent to seek a good initialization of prompt and lacks versatility across tasks and PTMs. In this paper, we present BBTv2, an improved version of Black-Box Tuning, to drive PTMs for few-shot learning. We prepend continuous prompts to every layer of the PTM and propose a divide-and-conquer gradient-free algorithm to optimize the prompts at different layers alternately. Extensive experiments across various tasks and PTMs show that BBTv2 can achieve comparable performance to full model tuning and state-of-the-art parameter-efficient methods (e.g., Adapter, LoRA, BitFit, etc.) under few-shot settings while maintaining much fewer tunable parameters. TI - BBTv2: Towards a Gradient-Free Future with Large Language Models JF - Computing Research Repository DO - 10.48550/arxiv.2205.11200 DA - 2022-05-23 UR - https://www.deepdyve.com/lp/arxiv-cornell-university/bbtv2-towards-a-gradient-free-future-with-large-language-models-aSVGmKlICW VL - 2023 IS - 2205 DP - DeepDyve ER -