Document: Invite

Comments:

Practical Prompt Engineering, by Cameron R. Wolfe

Author: Cameron R. Wolfe

Wolfe, Cameron R. “Practical Prompt Engineering.” Practical Prompt Engineering - by Cameron R. Wolfe, 1 May 2023, cameronrwolfe.substack.com/p/practical-prompt-engineering-part.

Due to their text-to-text format, large language models (LLMs) are capable of solving a wide variety of tasks with a single model. Such a capability was originally demonstrated via zero and few-shot learning with models like GPT-2 and GPT-3 [5, 6]. When fine-tuned to align with human preferences and instructions, however, LLMs become even more compelling, enabling popular generative applications such as coding assistants, information-seeking dialogue agents, and chat-based search experiences.

Paragraph 3 0

No paragraph-level conversations. Start one.

Paragraph 3, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 3, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 3, Sentence 3 0

No sentence-level conversations. Start one.

Due to the applications that they make possible, LLMs have seen a quick rise to fame both in research communities and popular culture. During this rise, we have also witnessed the development of a new, complementary field: prompt engineering. At a high-level, LLMs operate by i) taking text (i.e., a prompt) as input and ii) producing textual output from which we can extract something useful (e.g., a classification, summarization, translation, etc.) . The flexibility of this approach is beneficial. At the same time, however, we must determine how to properly construct out input prompt such that the LLM has the best chance of generating the desired output.

Paragraph 4 0

No paragraph-level conversations. Start one.

Paragraph 4, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 4, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 4, Sentence 3 0

No sentence-level conversations. Start one.

Paragraph 4, Sentence 4 0

No sentence-level conversations. Start one.

Paragraph 4, Sentence 5 0

No sentence-level conversations. Start one.

Paragraph 4, Sentence 6 0

No sentence-level conversations. Start one.

Prompt engineering is an empirical science that studies how different prompting strategies can be use to optimize LLM performance. Although a variety of approaches exist, we will spend this overview building an understanding of the general mechanics of prompting, as well as a few fundamental (but incredibly effective!) prompting techniques like zero/few-shot learning and instruction prompting. Along the way, we will learn practical tricks and takeaways that can immediately be adopted to become a more effective prompt engineer and LLM practitioner.

Paragraph 5 0

No paragraph-level conversations. Start one.

Paragraph 5, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 5, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 5, Sentence 3 0

No sentence-level conversations. Start one.

understanding LLMs. Due to its focus upon prompting, this overview will not explain the history or mechanics of language models. To gain a better general understanding of language models (which is an important prerequisite for deeply understanding prompting), I’ve written a variety of overviews that are available. These overviews are listed below (in order of importance):

Paragraph 6 0

No paragraph-level conversations. Start one.

Paragraph 6, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 6, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 6, Sentence 3 0

No sentence-level conversations. Start one.

Paragraph 6, Sentence 4 0

No sentence-level conversations. Start one.

Language Modeling Basics (GPT and GPT-2) [link]

Paragraph 7 0

No paragraph-level conversations. Start one.

Paragraph 7, Sentence 1 0

No sentence-level conversations. Start one.

The Importance of Scale for Language Models (GPT-3) [link]

Paragraph 8 0

No paragraph-level conversations. Start one.

Paragraph 8, Sentence 1 0

No sentence-level conversations. Start one.

Modern [link] and Specialized [link] LLMs

Paragraph 9 0

No paragraph-level conversations. Start one.

Paragraph 9, Sentence 1 0

No sentence-level conversations. Start one.

PaLM, T5 (Part One and Two), LLaMA (Part One and Two)

Paragraph 10 0

No paragraph-level conversations. Start one.

Paragraph 10, Sentence 1 0

No sentence-level conversations. Start one.

Prompting at a Glance

Paragraph 11 0

No paragraph-level conversations. Start one.

Paragraph 11, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 12 (Image 2) 0

No whole image conversations. Start one.

Whole Image 0

No whole image conversations. Start one.

New area 0

No area-level conversations. Start one.

Language models can solve a variety of tasks using their generic, text-to-text format (from [1])

Paragraph 13 0

No paragraph-level conversations. Start one.

Paragraph 13, Sentence 1 0

No sentence-level conversations. Start one.

Given the current hype around LLMs, we might ask ourselves: what are the fundamental strengths of LLMs that make them so powerful? Although there’s not a single answer to this question (e.g., model scale, massive pre-training data, human feedback, etc.), one major strength of LLMs is their generic, text-to-text format. These models are experts at next-token prediction, and so many different tasks can be solved by properly tuning and leveraging this skill!

Paragraph 14 0

No paragraph-level conversations. Start one.

Paragraph 14, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 14, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 14, Sentence 3 0

No sentence-level conversations. Start one.

To solve a task, all we need to do is i) provide textual input to the model that contains relevant information and ii) extract output from text returned by the model. Such a unified approach can be used for translation, summarization, question answering, classification, and more. However, the story is not (quite) that simple. Namely, the wording and structure of the prompt (i.e., the inputted text) provided to the LLM can significantly impact the model’s accuracy. In other words, prompt engineering is a huge deal.

Paragraph 15 0

No paragraph-level conversations. Start one.

Paragraph 15, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 15, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 15, Sentence 3 0

No sentence-level conversations. Start one.

Paragraph 15, Sentence 4 0

No sentence-level conversations. Start one.

Paragraph 15, Sentence 5 0

No sentence-level conversations. Start one.

What is prompt engineering?

Paragraph 16 0

No paragraph-level conversations. Start one.

Paragraph 16, Sentence 1 0

No sentence-level conversations. Start one.

“Prompt engineering is a relatively new discipline for developing and optimizing prompts to efficiently use LMs for a wide variety of applications and research topics.” – from [2]

Paragraph 17 0
No paragraph-level conversations. Start one.

Paragraph 17, Sentence 1 0
No sentence-level conversations. Start one.

Given that properly crafting the contents of our prompt is important to achieving useful results with an LLM, prompt engineering has gained a lot of interest in recent months. However, it’s an empirical science—discovering the best-possible prompts is typically heuristic-based and requires experimentation. We can discover better prompts by tracking and versioning our prompts over time and testing different ideas to see what works.

Paragraph 18 0

No paragraph-level conversations. Start one.

Paragraph 18, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 18, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 18, Sentence 3 0

No sentence-level conversations. Start one.

Paragraph 19 (Image 3) 0

No whole image conversations. Start one.

Whole Image 0

No whole image conversations. Start one.

Prompting an LLM with instructions

Paragraph 20 0

No paragraph-level conversations. Start one.

Paragraph 20, Sentence 1 0

No sentence-level conversations. Start one.

components of a prompt. There are a variety of options for how a prompt can be created. However, most prompts are comprised of the same few (optional) components:

Paragraph 21 0

No paragraph-level conversations. Start one.

Paragraph 21, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 21, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 21, Sentence 3 0

No sentence-level conversations. Start one.

Input Data: this is the actual data that the LLM is expected to process (e.g., the sentence being translated or classified, the document being summarized, etc.) .

Paragraph 22 0

No paragraph-level conversations. Start one.

Paragraph 22, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 22, Sentence 2 0

No sentence-level conversations. Start one.

Exemplars: one of the best ways to demonstrate the correct behavior to an LLM is to provide a few concrete examples of input-output pairs inside of the prompt.

Paragraph 23 0

No paragraph-level conversations. Start one.

Paragraph 23, Sentence 1 0

No sentence-level conversations. Start one.

Instruction: instead of showing concrete exemplars of correct behavior in the prompt, we could just textually describe what to do via an instruction; see above.

Paragraph 24 0

No paragraph-level conversations. Start one.

Paragraph 24, Sentence 1 0

No sentence-level conversations. Start one.

Indicators: providing input to an LLM in a fixed and predictable structure is helpful, so we might separate different parts of our prompt by using indicators; see below.

Paragraph 25 0

No paragraph-level conversations. Start one.

Paragraph 25, Sentence 1 0

No sentence-level conversations. Start one.

Context: Beyond the components described above, we may want to provide extra “context” or information to the LLM in some way.

Paragraph 26 0

No paragraph-level conversations. Start one.

Paragraph 26, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 27 (Image 4) 0

No whole image conversations. Start one.

Whole Image 0

No whole image conversations. Start one.

Indicators can be used to structure prompts in a variety of ways

Paragraph 28 0

No paragraph-level conversations. Start one.

Paragraph 28, Sentence 1 0

No sentence-level conversations. Start one.

general tips. The details of prompt engineering differ a lot depending on the model being used and what task we are trying to solve. However, there are a few generally-accepted principles for prompt engineering that are helpful to keep in mind [1, 3].

Paragraph 29 0

No paragraph-level conversations. Start one.

Paragraph 29, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 29, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 29, Sentence 3 0

No sentence-level conversations. Start one.

Start simple: start with a simple prompt, then slowly modify the prompt while tracking empirical results.

Paragraph 30 0

No paragraph-level conversations. Start one.

Paragraph 30, Sentence 1 0

No sentence-level conversations. Start one.

Be direct: if we want the LLM to match a specific style or format, we should state this clearly and directly. Stating exactly what you want gets the message across.

Paragraph 31 0

No paragraph-level conversations. Start one.

Paragraph 31, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 31, Sentence 2 0

No sentence-level conversations. Start one.

Specificity: ambiguity is the enemy of every prompt engineer. We should make the prompt detailed and specific without going overboard and providing an input that is too long (i.e., there are limitations to how long the prompt can be!) .

Paragraph 32 0

No paragraph-level conversations. Start one.

Paragraph 32, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 32, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 32, Sentence 3 0

No sentence-level conversations. Start one.

Exemplars are powerful: if describing what we want is difficult, it might be useful to provide concrete examples of correct output or behavior for several different inputs.

Paragraph 33 0

No paragraph-level conversations. Start one.

Paragraph 33, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 34 (Image 5) 0

No whole image conversations. Start one.

Whole Image 0

No whole image conversations. Start one.

Visualizing the context window for a language model

Paragraph 35 0

No paragraph-level conversations. Start one.

Paragraph 35, Sentence 1 0

No sentence-level conversations. Start one.

the context window. As we consider different prompting tips and approaches, we need to remember that we can only include a limited amount of information in our prompt. All LLMs have a pre-defined context window that sets a limit on the total number of tokens (i.e., words or sub-words in a textual sequence) that can be processed at a time. Context window size differs between models, but there is currently a strong push towards increasing context window sizes. For example, GPT-4 has a context window of 32K tokens, which is 4X bigger than any prior model from OpenAI.

Paragraph 36 0

No paragraph-level conversations. Start one.

Paragraph 36, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 36, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 36, Sentence 3 0

No sentence-level conversations. Start one.

Paragraph 36, Sentence 4 0

No sentence-level conversations. Start one.

Paragraph 36, Sentence 5 0

No sentence-level conversations. Start one.

Enjoy deep learning? Find current research topics difficult to parse? Join the >4K subscribers from Microsoft, Tesla, Google, Meta, and more that use Deep (Learning) Focus to better understand AI research!

Paragraph 37 0

No paragraph-level conversations. Start one.

Paragraph 37, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 37, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 37, Sentence 3 0

No sentence-level conversations. Start one.

Paragraph 38 0

No paragraph-level conversations. Start one.

Paragraph 38, Sentence 1 0

No sentence-level conversations. Start one.

Common Prompting Techniques

Paragraph 39 0

No paragraph-level conversations. Start one.

Paragraph 39, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 40 (Image 6) 0

No whole image conversations. Start one.

Whole Image 0

No whole image conversations. Start one.

The emergence of zero and few-shot learning (from [4, 5, 6])

Paragraph 41 0

No paragraph-level conversations. Start one.

Paragraph 41, Sentence 1 0

No sentence-level conversations. Start one.

Although LLMs have seen a recent explosion due to popular models like ChatGPT, prompting has been around for a while. Originally, models like GPT [4] were fine-tuned to solve downstream tasks. With the proposal of GPT-2 [5], we saw researchers start to use zero-shot learning to solve multiple downstream tasks with a single foundation model. Finally, GPT-3 showed us that language models become really good at few-shot learning as they grow in size. In this section, we will walk through these ideas to gain a better idea of how zero and few-shot learning work, as well as provide details on a few more complex prompting techniques.

Paragraph 42 0

No paragraph-level conversations. Start one.

Paragraph 42, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 42, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 42, Sentence 3 0

No sentence-level conversations. Start one.

Paragraph 42, Sentence 4 0

No sentence-level conversations. Start one.

Paragraph 42, Sentence 5 0

No sentence-level conversations. Start one.

Zero-Shot Learning

Paragraph 43 0

No paragraph-level conversations. Start one.

Paragraph 43, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 44 (Image 7) 0

No whole image conversations. Start one.

Whole Image 0

No whole image conversations. Start one.

(from [6])

Paragraph 45 0

No paragraph-level conversations. Start one.

Paragraph 45, Sentence 1 0

No sentence-level conversations. Start one.

The idea behind zero-shot learning is quite simple. We just feed a description of the task being solved and the relevant input data to an LLM and let it generate a result; see above. Due to the massive amount of pre-training data they observe, LLMs are often pretty capable of solving tasks in this way. Namely, they can leverage their knowledge base to solve a (relatively) large number of tasks; see the examples below (produced with GPT-3.5).

Paragraph 46 0

No paragraph-level conversations. Start one.

Paragraph 46, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 46, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 46, Sentence 3 0

No sentence-level conversations. Start one.

Paragraph 46, Sentence 4 0

No sentence-level conversations. Start one.

Paragraph 47 (Image 8) 0

No whole image conversations. Start one.

Whole Image 0

No whole image conversations. Start one.

Zero-shot learning with GPT-3.5

Paragraph 48 0

No paragraph-level conversations. Start one.

Paragraph 48, Sentence 1 0

No sentence-level conversations. Start one.

Zero-shot learning was explored extensively by models like GPT-2 and performs well in some cases. However, what should we do if zero-shot learning does not solve our task? In many cases, we can drastically improve the performance of an LLM by providing more specific and concrete information. In particular, we can start adding examples of desired output to the prompt, allowing the model to replicate patterns from data seen in the prompt.

Paragraph 49 0

No paragraph-level conversations. Start one.

Paragraph 49, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 49, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 49, Sentence 3 0

No sentence-level conversations. Start one.

Paragraph 49, Sentence 4 0

No sentence-level conversations. Start one.

Few-Shot Learning

Paragraph 50 0

No paragraph-level conversations. Start one.

Paragraph 50, Sentence 1 0

No sentence-level conversations. Start one.

Beyond just a task description, we can augment our prompt with high-quality input-output examples. This technique forms the basis of few-shot learning, which attempts to improve LLM performance by providing explicit examples of correct behavior. If used properly and applied to the correct model, few-shot learning is incredibly effective, as demonstrated by the breakthrough capabilities of LLMs like GPT-3 [6]; see below.

Paragraph 51 0

No paragraph-level conversations. Start one.

Paragraph 51, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 51, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 51, Sentence 3 0

No sentence-level conversations. Start one.

Paragraph 52 (Image 9) 0

No whole image conversations. Start one.

Whole Image 0

No whole image conversations. Start one.

(from [3])

Paragraph 53 0

No paragraph-level conversations. Start one.

Paragraph 53, Sentence 1 0

No sentence-level conversations. Start one.

However, learning how to properly leverage the few-shot learning capabilities of LLMs can be complicated. What examples should we include in the prompt? Is there a correct way to structure the prompt? Do changes to the prompt significantly affect the LLM?

Paragraph 54 0

No paragraph-level conversations. Start one.

Paragraph 54, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 54, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 54, Sentence 3 0

No sentence-level conversations. Start one.

Paragraph 54, Sentence 4 0

No sentence-level conversations. Start one.

Most LLMs are sensitive to the manner in which the prompt is constructed, making prompt engineering both difficult and important. Although recent models like GPT-4 seem to be less sensitive to small perturbations in the prompt [2], the research community [7] has provided us with some tips for properly using few-shot learning that are still helpful to understand:

Paragraph 55 0

No paragraph-level conversations. Start one.

Paragraph 55, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 55, Sentence 2 0

No sentence-level conversations. Start one.

Exemplar ordering is important, and permuting few-shot examples can drastically change LLM performance. Including more few-shot examples does not solve this problem.

Paragraph 56 0

No paragraph-level conversations. Start one.

Paragraph 56, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 56, Sentence 2 0

No sentence-level conversations. Start one.

The distribution of labels in the few-shot examples matters and should match the actual distribution of data in the wild. Surprisingly, the correctness of labels is not as important.

Paragraph 57 0

No paragraph-level conversations. Start one.

Paragraph 57, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 57, Sentence 2 0

No sentence-level conversations. Start one.

LLMs tend to be biased towards repeating the last of the few-shot examples (i.e., recency bias).

Paragraph 58 0

No paragraph-level conversations. Start one.

Paragraph 58, Sentence 1 0

No sentence-level conversations. Start one.

Exemplars that are included in the prompt should be diverse and randomly ordered.

Paragraph 59 0

No paragraph-level conversations. Start one.

Paragraph 59, Sentence 1 0

No sentence-level conversations. Start one.

optimal data sampling. Selecting examples that are diverse, randomly-ordered, and related to the test example is best. Beyond these basic intuitions, however, a significant amount of research has been done to determine how to select optimal exemplars for a prompt. For example, few-show learning samples can be chosen via diversity selection [8], uncertainty-based selection [9], or even selection based on similarity to the test example [10].

Paragraph 60 0

No paragraph-level conversations. Start one.

Paragraph 60, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 60, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 60, Sentence 3 0

No sentence-level conversations. Start one.

Paragraph 60, Sentence 4 0

No sentence-level conversations. Start one.

Paragraph 61 (Image 10) 0

No whole image conversations. Start one.

Whole Image 0

No whole image conversations. Start one.

few-shot learning vs. fine-tuning. Prior to moving on, I want to address a notable point of confusion. Few-shot learning is not fine-tuning. Few-shot learning presents examples to the LLM inside of the prompt, which can then be used as relevant context for generating the correct output. This process is referred to as “in-context learning”; see above. The model’s parameters are not modified by few-shot learning. In contrast, fine-tuning explicitly trains the model (i.e., updates its weights via backpropagation) over a chosen dataset.

Paragraph 62 0

No paragraph-level conversations. Start one.

Paragraph 62, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 62, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 62, Sentence 3 0

No sentence-level conversations. Start one.

Paragraph 62, Sentence 4 0

No sentence-level conversations. Start one.

Paragraph 62, Sentence 5 0

No sentence-level conversations. Start one.

Paragraph 62, Sentence 6 0

No sentence-level conversations. Start one.

Paragraph 62, Sentence 7 0

No sentence-level conversations. Start one.

Instruction Prompting

Paragraph 63 0

No paragraph-level conversations. Start one.

Paragraph 63, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 64 (Image 11) 0

No whole image conversations. Start one.

Whole Image 0

No whole image conversations. Start one.

Using an instruction tuned language model as a coding assistant (from [15])

Paragraph 65 0

No paragraph-level conversations. Start one.

Paragraph 65, Sentence 1 0

No sentence-level conversations. Start one.

Few-shot learning is incredibly powerful, but it has a notable drawback: exemplars consume a lot of tokens. Given that the context window of an LLM is limited, we might want to explore prompting methods that do not consume as many tokens. For example, can we textually explain the correct behavior to an LLM? The short answer is yes! This technique, which just includes a written instruction as part of the prompt, is known as instruction prompting, and it performs best with a particular type of LLM.

Paragraph 66 0

No paragraph-level conversations. Start one.

Paragraph 66, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 66, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 66, Sentence 3 0

No sentence-level conversations. Start one.

Paragraph 66, Sentence 4 0

No sentence-level conversations. Start one.

Paragraph 66, Sentence 5 0

No sentence-level conversations. Start one.

instruction tuning and alignment. Recent development of language models has heavily focused upon improving instruction following capabilities. Pre-trained LLMs are not good at following instructions out-of-the-box. However, teaching these models how to follow instructions makes them a lot better at accomplishing what the user wants (i.e., improves human alignment). Instruction following LLMs power a variety of useful applications from information seeking dialogue agents (e.g., ChatGPT) to coding assistants (e.g., Codex [13]); see below.

Paragraph 67 0

No paragraph-level conversations. Start one.

Paragraph 67, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 67, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 67, Sentence 3 0

No sentence-level conversations. Start one.

Paragraph 67, Sentence 4 0

No sentence-level conversations. Start one.

Paragraph 67, Sentence 5 0

No sentence-level conversations. Start one.

Paragraph 68 (Image 12) 0

No whole image conversations. Start one.

Whole Image 0

No whole image conversations. Start one.

(from [13] and [14])

Paragraph 69 0

No paragraph-level conversations. Start one.

Paragraph 69, Sentence 1 0

No sentence-level conversations. Start one.

As has been discussed extensively in prior posts, the first step in creating an LLM is pre-training the model using a language modeling objective over a large, unlabeled corpus of text. During this process, the model gains information and learns to accurately perform next-token prediction. However, the model’s output is not always interesting, compelling, or helpful, and the model usually struggles to comply with complex instructions. To encourage such behavior, we need to go beyond basic pre-training.

Paragraph 70 0

No paragraph-level conversations. Start one.

Paragraph 70, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 70, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 70, Sentence 3 0

No sentence-level conversations. Start one.

Paragraph 70, Sentence 4 0

No sentence-level conversations. Start one.

creating instruction-following LLMs. There are a couple of different approaches for teaching an LLM how how to follow instructions. For example, we can perform instruction tuning [12], or fine-tune the LLM over examples of dialogues that include instructions. Several notable models adopt this approach, such as LLaMA (and its variants) [15], all FLAN models [12], OPT-IML [16], and more. Alternatively, we could use the three-step approach comprised of supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF); see below. This methodology has led to the creation of incredible models such as ChatGPT, GPT-4, Sparrow [17], and more.

Paragraph 71 0

No paragraph-level conversations. Start one.

Paragraph 71, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 71, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 71, Sentence 3 0

No sentence-level conversations. Start one.

Paragraph 71, Sentence 4 0

No sentence-level conversations. Start one.

Paragraph 71, Sentence 5 0

No sentence-level conversations. Start one.

Paragraph 71, Sentence 6 0

No sentence-level conversations. Start one.

Paragraph 72 (Image 13) 0

No whole image conversations. Start one.

Whole Image 0

No whole image conversations. Start one.

Aligning LLMs based on human feedback (from [13])

Paragraph 73 0

No paragraph-level conversations. Start one.

Paragraph 73, Sentence 1 0

No sentence-level conversations. Start one.

crafting useful instructions. If we have access to an LLM that has been trained to follow instructions, we can accomplish a lot by prompting the model with useful and informative instructions. Here are some key tips and ideas for using instruction prompting:

Paragraph 74 0

No paragraph-level conversations. Start one.

Paragraph 74, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 74, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 74, Sentence 3 0

No sentence-level conversations. Start one.

Just like the rest of our prompt, the instruction should be specific and detailed.

Paragraph 75 0

No paragraph-level conversations. Start one.

Paragraph 75, Sentence 1 0

No sentence-level conversations. Start one.

We should avoid telling the LLM to not do something in the prompt. Rather, we should focus on telling the LLM what to do.

Paragraph 76 0

No paragraph-level conversations. Start one.

Paragraph 76, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 76, Sentence 2 0

No sentence-level conversations. Start one.

Using an input structure with indicators that clearly identify the instruction within the prompt is helpful; see below.

Paragraph 77 0

No paragraph-level conversations. Start one.

Paragraph 77, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 78 (Image 14) 0

No whole image conversations. Start one.

Whole Image 0

No whole image conversations. Start one.

Different formats for instruction prompting

Paragraph 79 0

No paragraph-level conversations. Start one.

Paragraph 79, Sentence 1 0

No sentence-level conversations. Start one.

role prompting. Another interesting prompting technique that is tangentially related to instruction prompting is role prompting, which assigns a “role” or persona to the model. This role is assigned within the prompt via a textual snippet such as:

Paragraph 80 0

No paragraph-level conversations. Start one.

Paragraph 80, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 80, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 80, Sentence 3 0

No sentence-level conversations. Start one.

You are a famous and brilliant mathematician.

Paragraph 81 0

No paragraph-level conversations. Start one.

Paragraph 81, Sentence 1 0

No sentence-level conversations. Start one.

You are a doctor.

Paragraph 82 0

No paragraph-level conversations. Start one.

Paragraph 82, Sentence 1 0

No sentence-level conversations. Start one.

You are a musical expert.

Paragraph 83 0

No paragraph-level conversations. Start one.

Paragraph 83, Sentence 1 0

No sentence-level conversations. Start one.

Interestingly, recent LLMs are able to assume and maintain such roles quite well throughout a dialogue [18]; see below.

Paragraph 84 0

No paragraph-level conversations. Start one.

Paragraph 84, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 85 (Image 15) 0

No whole image conversations. Start one.

Whole Image 0

No whole image conversations. Start one.

Role prompting with LaMDA (from [18])

Paragraph 86 0

No paragraph-level conversations. Start one.

Paragraph 86, Sentence 1 0

No sentence-level conversations. Start one.

Going further, role prompting isn’t just a fun trick. Providing a role to the LLM can actually improve performance (e.g., role prompting GPT-3 as a “brilliant mathematician” can improve performance on arithmetic-based questions). However, role prompting only improves performance in certain cases.

Paragraph 87 0

No paragraph-level conversations. Start one.

Paragraph 87, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 87, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 87, Sentence 3 0

No sentence-level conversations. Start one.

“When assigning a role to the AI, we are giving it some context. This context helps the AI understand the question better. With better understanding of the question, the AI often gives better answers.” – from learnprompting.org

Paragraph 88 0
No paragraph-level conversations. Start one.

Paragraph 88, Sentence 1 0
No sentence-level conversations. Start one.

Paragraph 88, Sentence 2 0
No sentence-level conversations. Start one.

Paragraph 88, Sentence 3 0
No sentence-level conversations. Start one.

instruction prompting in the real world. Prompting LLMs with instructions is an incredibly powerful tool that we can use for a variety of applications. To understand how to leverage this technique, we can look no further than the recent release of ChatGPT plugins, which included an open-source information retrieval API. Inside of this API, there are two specific modules provided for extracting metadata from documents and filtering personally identifiable information (PII). Interestingly, these services are entirely LLM-based and use the prompts shown below.

Paragraph 89 0

No paragraph-level conversations. Start one.

Paragraph 89, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 89, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 89, Sentence 3 0

No sentence-level conversations. Start one.

Paragraph 89, Sentence 4 0

No sentence-level conversations. Start one.

Paragraph 89, Sentence 5 0

No sentence-level conversations. Start one.

Paragraph 90 (Image 16) 0

No whole image conversations. Start one.

Whole Image 0

No whole image conversations. Start one.

Prompts for metadata extraction and PII detection in the ChatGPT information retrieval API

Paragraph 91 0

No paragraph-level conversations. Start one.

Paragraph 91, Sentence 1 0

No sentence-level conversations. Start one.

Within these prompts, the LLM is provided with specific and detailed instructions regarding how to perform its desired task. Some notable aspects of the instructions are:

Paragraph 92 0

No paragraph-level conversations. Start one.

Paragraph 92, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 92, Sentence 2 0

No sentence-level conversations. Start one.

The desired output format (either json or true/false) is explicitly stated.

Paragraph 93 0

No paragraph-level conversations. Start one.

Paragraph 93, Sentence 1 0

No sentence-level conversations. Start one.

The instruction uses a structured format (i.e., bullet-separated list) to describe important information.

Paragraph 94 0

No paragraph-level conversations. Start one.

Paragraph 94, Sentence 1 0

No sentence-level conversations. Start one.

The task of the LLM (i.e., identifying PII or extracting metadata) is explicitly stated in the prompt.

Paragraph 95 0

No paragraph-level conversations. Start one.

Paragraph 95, Sentence 1 0

No sentence-level conversations. Start one.

Interestingly, these prompts tell the model what not to do on multiple occasions, which is typically advised against.

Paragraph 96 0

No paragraph-level conversations. Start one.

Paragraph 96, Sentence 1 0

No sentence-level conversations. Start one.

Trusting an LLM to accurately perform critical tasks like PII detection might not be the best idea given their limitations. Nonetheless, such an approach demonstrates the incredible potential of instruction prompting. Instead of writing an entire program or service, we may be able to quickly solve a lot of tasks by just writing a prompt.

Paragraph 97 0

No paragraph-level conversations. Start one.

Paragraph 97, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 97, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 97, Sentence 3 0

No sentence-level conversations. Start one.

Paragraph 98 0

No paragraph-level conversations. Start one.

Paragraph 98, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 98, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 98, Sentence 3 0

No sentence-level conversations. Start one.

Paragraph 99 0

No paragraph-level conversations. Start one.

Paragraph 99, Sentence 1 0

No sentence-level conversations. Start one.

Takeaways

Paragraph 100 0

No paragraph-level conversations. Start one.

Paragraph 100, Sentence 1 0

No sentence-level conversations. Start one.

“Writing a really great prompt for a chatbot persona is an amazingly high-leverage skill and an early example of programming in a little bit of natural language” – Sam Altman

Paragraph 101 0
No paragraph-level conversations. Start one.

Paragraph 101, Sentence 1 0
No sentence-level conversations. Start one.

If we learn nothing else from this overview, we should know that constructing the correct prompt (i.e., prompt engineering) is a large part of successfully leveraging LLMs in practice. Language models, due to their text-to-text structure, are incredibly generic and can be used to solve a variety of tasks. However, we must provide these models with detailed and appropriate context for them to perform well. Although optimal prompting techniques differ depending on the model and tasks, there are many high-level takeaways that we can leverage to maximize chances of success.

Paragraph 102 0

No paragraph-level conversations. Start one.

Paragraph 102, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 102, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 102, Sentence 3 0

No sentence-level conversations. Start one.

Paragraph 102, Sentence 4 0

No sentence-level conversations. Start one.

from zero to few-shot learning. Given their extensive pre-training (and, these days, fine-tuning) datasets, LLMs contain a ton of information and are capable of solving a variety of tasks out-of-the-box. To do this, we only provide the model with a task description and relevant input data, then the model is expected to generate the correct output. However, zero-shot learning can only perform so well due to the limited context provided to the model. To improve upon the performance of zero-shot learning, we should leverage few-show learning by inserting exemplars in the prompt.

Paragraph 103 0

No paragraph-level conversations. Start one.

Paragraph 103, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 103, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 103, Sentence 3 0

No sentence-level conversations. Start one.

Paragraph 103, Sentence 4 0

No sentence-level conversations. Start one.

Paragraph 103, Sentence 5 0

No sentence-level conversations. Start one.

instruction-following LLMs. Although it performs well, few-shot learning typically consumes a lot of tokens, which is a problem given the limited context window of most LLMs. To work around this, we can adopt an instruction prompting approach that provides a precise, textual description of the LLM’s desired behavior as opposed to capturing this behavior with concrete examples of correct output. Instruction prompting is powerful, but it requires a specific form of LLM that has been fine-tuned (e.g., via instruction tuning or RLHF) to work well. Pre-trained LLMs are not great at following instructions out of the box.

Paragraph 104 0

No paragraph-level conversations. Start one.

Paragraph 104, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 104, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 104, Sentence 3 0

No sentence-level conversations. Start one.

Paragraph 104, Sentence 4 0

No sentence-level conversations. Start one.

Paragraph 104, Sentence 5 0

No sentence-level conversations. Start one.

tips and tricks. Prompt engineering comes with a variety of tricks and best practices that can be adopted. Typically, such techniques fluctuate with each new model release (e.g., GPT-4 is much better at handling unstructured prompts compared to prior models [2]), but a few principles have remained applicable for quite some time. First, we should always start with a simple prompt, then slowly add complexity. As we develop our prompt, we should aim to be specific and detailed, while avoiding being overly verbose (due to the limited context window). Finally, to truly maximize LLM performance, we usually need to leverage few-shot learning, instruction prompting, or a more complex approach.

Paragraph 105 0

No paragraph-level conversations. Start one.

Paragraph 105, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 105, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 105, Sentence 3 0

No sentence-level conversations. Start one.

Paragraph 105, Sentence 4 0

No sentence-level conversations. Start one.

Paragraph 105, Sentence 5 0

No sentence-level conversations. Start one.

Paragraph 105, Sentence 6 0

No sentence-level conversations. Start one.

New to the newsletter?

Paragraph 106 0

No paragraph-level conversations. Start one.

Paragraph 106, Sentence 1 0

No sentence-level conversations. Start one.

Hello! I am Cameron R. Wolfe, Director of AI at Rebuy and PhD student at Rice University. I study the empirical and theoretical foundations of deep learning. This is the Deep (Learning) Focus newsletter, where I help readers build a better understanding of deep learning research via understandable overviews that explain relevant topics from the ground up. If you like this newsletter, please subscribe, share it, or follow me on twitter!

Paragraph 107 0

No paragraph-level conversations. Start one.

Paragraph 107, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 107, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 107, Sentence 3 0

No sentence-level conversations. Start one.

Paragraph 107, Sentence 4 0

No sentence-level conversations. Start one.

Paragraph 107, Sentence 5 0

No sentence-level conversations. Start one.

Paragraph 108 0

No paragraph-level conversations. Start one.

Paragraph 108, Sentence 1 0

No sentence-level conversations. Start one.

Bibliography

Paragraph 109 0

No paragraph-level conversations. Start one.

Paragraph 109, Sentence 1 0

No sentence-level conversations. Start one.

[1] Raffel, Colin, et al. “Exploring the limits of transfer learning with a unified text-to-text transformer.” The Journal of Machine Learning Research 21.1 (2020): 5485-5551.

Paragraph 110 0

No paragraph-level conversations. Start one.

Paragraph 110, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 110, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 110, Sentence 3 0

No sentence-level conversations. Start one.

[2] Saravia, Elvis, et al. “Prompt Engineering Guide”, https://github.com/dair-ai/Prompt-Engineering-Guide (2022).

Paragraph 111 0

No paragraph-level conversations. Start one.

Paragraph 111, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 111, Sentence 2 0

No sentence-level conversations. Start one.

[3] Weng, Lilian. (Mar 2023). Prompt Engineering. Lil’Log. https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/.

Paragraph 112 0

No paragraph-level conversations. Start one.

Paragraph 112, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 112, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 112, Sentence 3 0

No sentence-level conversations. Start one.

Paragraph 112, Sentence 4 0

No sentence-level conversations. Start one.

[4] Radford, Alec, et al. “Improving language understanding by generative pre-training.” (2018).

Paragraph 113 0

No paragraph-level conversations. Start one.

Paragraph 113, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 113, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 113, Sentence 3 0

No sentence-level conversations. Start one.

[5] Radford, Alec, et al. “Language Models are Unsupervised Multitask Learners.”

Paragraph 114 0

No paragraph-level conversations. Start one.

Paragraph 114, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 114, Sentence 2 0

No sentence-level conversations. Start one.

[6] Brown, Tom, et al. “Language models are few-shot learners.” Advances in neural information processing systems 33 (2020): 1877-1901.

Paragraph 115 0

No paragraph-level conversations. Start one.

Paragraph 115, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 115, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 115, Sentence 3 0

No sentence-level conversations. Start one.

[7] Tony Z. Zhao, Eric Wallace, Shi Feng, Dan Klein, and Sameer Singh. 2021. Calibrate before use: Improving few-shot performance of language models. ICML.

Paragraph 116 0

No paragraph-level conversations. Start one.

Paragraph 116, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 116, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 116, Sentence 3 0

No sentence-level conversations. Start one.

Paragraph 116, Sentence 4 0

No sentence-level conversations. Start one.

[8] Su, Hongjin, et al. “Selective annotation makes language models better few-shot learners.” arXiv preprint arXiv:2209.01975 (2022).

Paragraph 117 0

No paragraph-level conversations. Start one.

Paragraph 117, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 117, Sentence 2 0

No sentence-level conversations. Start one.

[9] Diao, Shizhe, et al. “Active Prompting with Chain-of-Thought for Large Language Models.” arXiv preprint arXiv:2302.12246 (2023).

Paragraph 118 0

No paragraph-level conversations. Start one.

Paragraph 118, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 118, Sentence 2 0

No sentence-level conversations. Start one.

[10] Liu, Jiachang, et al. “What Makes Good In-Context Examples for GPT-$3 $?.” arXiv preprint arXiv:2101.06804 (2021).

Paragraph 119 0

No paragraph-level conversations. Start one.

Paragraph 119, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 119, Sentence 2 0

No sentence-level conversations. Start one.

[11] Wei, Jason, et al. “Chain of thought prompting elicits reasoning in large language models.” arXiv preprint arXiv:2201.11903 (2022).

Paragraph 120 0

No paragraph-level conversations. Start one.

Paragraph 120, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 120, Sentence 2 0

No sentence-level conversations. Start one.

[12] Wei, Jason, et al. “Finetuned language models are zero-shot learners.” arXiv preprint arXiv:2109.01652 (2021).

Paragraph 121 0

No paragraph-level conversations. Start one.

Paragraph 121, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 121, Sentence 2 0

No sentence-level conversations. Start one.

[13] Chen, Mark, et al. “Evaluating large language models trained on code.” arXiv preprint arXiv:2107.03374 (2021).

Paragraph 122 0

No paragraph-level conversations. Start one.

Paragraph 122, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 122, Sentence 2 0

No sentence-level conversations. Start one.

[14] Ouyang, Long, et al. “Training language models to follow instructions with human feedback.” Advances in Neural Information Processing Systems 35 (2022): 27730-27744.

Paragraph 123 0

No paragraph-level conversations. Start one.

Paragraph 123, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 123, Sentence 2 0

No sentence-level conversations. Start one.

Paragraph 123, Sentence 3 0

No sentence-level conversations. Start one.

[15] Touvron, Hugo, et al. “Llama: Open and efficient foundation language models.” arXiv preprint arXiv:2302.13971 (2023).

Paragraph 124 0

No paragraph-level conversations. Start one.

Paragraph 124, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 124, Sentence 2 0

No sentence-level conversations. Start one.

[16] Iyer, Srinivasan, et al. “OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization.” arXiv preprint arXiv:2212.12017 (2022).

Paragraph 125 0

No paragraph-level conversations. Start one.

Paragraph 125, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 125, Sentence 2 0

No sentence-level conversations. Start one.

[17] Glaese, Amelia, et al. “Improving alignment of dialogue agents via targeted human judgements.” arXiv preprint arXiv:2209.14375 (2022).

Paragraph 126 0

No paragraph-level conversations. Start one.

Paragraph 126, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 126, Sentence 2 0

No sentence-level conversations. Start one.

[18] Thoppilan, Romal, et al. “Lamda: Language models for dialog applications.” arXiv preprint arXiv:2201.08239 (2022).

Paragraph 127 0

No paragraph-level conversations. Start one.

Paragraph 127, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 127, Sentence 2 0

No sentence-level conversations. Start one.

Subscribe to Deep (Learning) Focus

Paragraph 128 0

No paragraph-level conversations. Start one.

Paragraph 128, Sentence 1 0

No sentence-level conversations. Start one.

By Cameron R. Wolfe · Launched 8 months ago

Paragraph 129 0

No paragraph-level conversations. Start one.

Paragraph 129, Sentence 1 0

No sentence-level conversations. Start one.

I contextualize and explain relevant deep learning topics.

Paragraph 130 0

No paragraph-level conversations. Start one.

Paragraph 130, Sentence 1 0

No sentence-level conversations. Start one.

Paragraph 131 0

No paragraph-level conversations. Start one.

Paragraph 131, Sentence 1 0

No sentence-level conversations. Start one.

DMU Timestamp: May 12, 2023 14:09

General Document Comments 0

Image

0 comments, 0 areas

add area

add comment

change display

help

Video

add comment

help

Practical Prompt Engineering, by Cameron R. Wolfe

Author: Cameron R. Wolfe

0 changes, most recent less than a minute ago

0 General Document comments
0 Sentence and Paragraph comments
0 Image and Video comments

Tips and tricks for successful prompting with LLMs...

Prompting at a Glance

What is prompt engineering?

Common Prompting Techniques

Zero-Shot Learning

Few-Shot Learning

Instruction Prompting

Takeaways

New to the newsletter?

Bibliography

Subscribe to Deep (Learning) Focus

General Document Comments 0

Select your Thinking Partner
(unless it has been pre-selected)

Original

Resubmission

Add comment at:

Quickstart: Commenting and Sharing

Practical Prompt Engineering, by Cameron R. Wolfe

Author: Cameron R. Wolfe

0 changes, most recent less than a minute ago

0 General Document comments 0 Sentence and Paragraph comments 0 Image and Video comments

Tips and tricks for successful prompting with LLMs...

Prompting at a Glance

What is prompt engineering?

Common Prompting Techniques

Zero-Shot Learning

Few-Shot Learning

Instruction Prompting

Takeaways

New to the newsletter?

Bibliography

Subscribe to Deep (Learning) Focus

General Document Comments 0

Select your Thinking Partner(unless it has been pre-selected)

Original

Resubmission

Add comment at:

Quickstart: Commenting and Sharing

0 General Document comments
0 Sentence and Paragraph comments
0 Image and Video comments

Select your Thinking Partner
(unless it has been pre-selected)