Strategies for Reducing Costs in Large Language Model API Usage

The escalating costs associated with LLM APIs necessitate efficient strategies to manage and optimize their usage without compromising performance. The following strategies, derived from the “Frugal GPT” paper, can serve as an excellant guide to save on LLM API usage cost.

Strategy 1: Prompt Adaptation

The first strategy, Prompt Adaptation, involves reducing the prompt size to decrease costs. It encompasses two main tactics:

Prompt Selection: This involves choosing only the most essential examples within a prompt to ensure smaller size and cost while maintaining task efficacy. Query Concatenation: By bundling multiple queries into a single prompt, we eliminate repetitive prompt processing, thereby saving on the overall cost.

Strategy 2: LLM Approximation

When LLM API usage proves too costly, LLM Approximation strategies such as completion caches and model fine-tuning are employed.

Completion Cache: Stores LLM API responses, so similar future queries can be answered from the cache instead of incurring additional API calls. Model Fine-Tuning: A more affordable AI model is fine-tuned with responses from a costlier LLM API. This not only cuts costs but can also enhance response times due to the use of shorter prompts.

Strategy 3: LLM Cascade

LLM Cascade utilizes a sequence of LLM APIs to process queries efficiently, stopping when a satisfactory response is obtained.

Generation Scoring Function: Determines the reliability of responses, informing the decision on whether to continue querying the next LLM API. LLM Router: Selects the order of LLM APIs, optimizing the quality of responses within a specified budget. Compositions of Strategies Employing a combination of the aforementioned strategies can lead to greater efficiency. For instance, combining prompt selection with LLM cascade can further reduce costs while ensuring satisfactory task performance.

For better understanding of the strategies, the full paper is available here:

paper https://arxiv.org/pdf/2305.05176.pdf