December 7, 2023

Top 5 Learnings from Applying Prompt Engineering in Building AI

Top 5 Learnings from Applying Prompt Engineering in Building AI
The future top programming language is going to be natural language e.g. English.

2023 has shown us that AI is shaping up to be the next big thing - specifically Large Language Models (LLMs) like ChatGPT. 

People have done wondrous things with ChatGPT, from building new tools via its API to 10x-ing their productivity. And all that using just the English language. If LLMs can be provided with optimized instructions, its outputs can be controlled and molded to a diverse range of use cases. The art of doing this is known as prompt engineering.

The concept of prompt engineering is often overlooked, yet it's a crucial skill for effectively utilizing AI tools. Though, let’s get real: the buzz around prompt engineering as the next big career move is likely overhyped. However, mastering prompt engineering can enhance the way we interact with AI, eliciting more accurate and useful responses from AI systems. 

We’ve experienced firsthand how useful it can be when building our AI chatbot. We hope that our learnings from going through that trial & error process in this greenfield space will help any other aspiring builders working with LLMs. 

A Brief Overview of Prompts & Prompt Engineering

However, before I delve deeper into the topic, let’s rewind a little and cover what prompts are. Generative AI models like ChatGPT interface with users via textual input. Users tell the model what to do and it will attempt to accomplish the task. In a very broad sense, what users tell the model to do is called a prompt.

Prompts have been likened to magic spells, collections of words which achieve impossible effects, but only if you follow bizarre and complex rules. One of the key reasons for this is that it’s not very clearly understood how prompts translate to results for different LLMs.

An example of a prompt that uses all the key elements below

However, the most effective prompts that get results contain similar elements:

  1. Instructions - Self–explanatory, what the LLM is instructed to do 
  2. Chain-of-Thought Prompting - Forces the LLM to reason through a step-by-step reasoning process when generating an answer
  3. Persona - The LLM is told to act as an expert that is capable of giving a good answer to instructions given e.g. You are a fitness coach with 15 years of experience training athletes.
  4. Emotional Support - I’m not joking, it’s a thing

Prompt engineering is the act of optimizing these prompts to get the best possible answer from the LLM. There are various techniques that are used by prompt engineers to get the best performance from their LLMs. This can range from the aforementioned emotional pleading approach to more common approaches  like chain of thought prompting. 

This is just a brief overview in understanding prompts & prompt engineering. There are many articles out there like this and this that cover the topic in greater detail. However, the intent of this article is to cover our learnings when using prompt engineering during the development process of our AI chatbot. 

To begin with, prompt engineering was the first approach we tried when building our AI chatbot. The good news is that prompt engineering was very effective as a first-pass technique and helped us identify roadblocks very quickly. However we ran into quite a few limitations using this approach as the next section will cover. 

Our Top 5 Learnings Building An AI Chatbot

Intricate Rules

One of the first things we discovered was that prompt engineering was complex and had to follow rules which were not always easy to understand. Given the way LLMs are built, it’s not easy to track down why a prompt would result in a specific outcome.

Be prepared to invest significant time in this area due to the intricate rules. There are a lot of tricks and hacks that can help get more performance from the model. I’m gonna share two that gave us the most improvement which were ironically the simplest:

  1. Zero-shot prompting - Giving the LLM a task without prior examples and expecting it to understand and execute based on its pre-existing training.
  2. Few-shot prompting - Providing a few examples to guide the AI in understanding and responding to a task.
Source: Wei et al. 2022

We combined the above approaches with chain of thought prompting, an approach where the LLM is encouraged to adopt a step-by-step approach to reasoning. It seems rather simple, what with approaches like adversarial prompting or tree of thoughts prompting being researched, but these were the most practical and results-driven approaches for us.

The key takeaway here is that while complex approaches to prompt engineering may yield results, it would also take a lot of time to figure out. When developing AI products, turnaround time is often a key constraint, making techniques that are simple and enable quick experimentation useful.

Prompts are Sensitive

Prompts are very sensitive to even minor alterations e.g. rearranging words in a sentence or replacing a single word with its synonym. Anticipate the need to adjust your prompts when transitioning between different large language models (LLMs).

This was personally quite painful for us as we often encountered cases where we spent hours designing a prompt only to find it breaking or becoming unusable once we tried to tweak it to account for other scenarios in our use case.

Case-in-point: We only moved one sentence before the other and it significantly changed the output

Log Prompts & Responses

Given the sensitivity of LLM outputs to minor prompt changes, you’d actually be surprised to hear how little teams record prompts & responses when trying to perform prompt engineering.

Logging prompts & responses are a critical part of the LLM app development process and our team in SUPA has benefited greatly from doing this. It helps a lot especially when trying to zero in on which change caused accuracy of responses to drop. 

There are quite a few tools out there that can be used to facilitate this process. Trust us when we say it’s worth the investment to do so. In SUPA, we use LLMonitor to document all the prompts that we experiment with. 

We’re not sponsored, just happy customers

Not a Solution to Hallucination

Hallucinations refer to cases where LLMs generate false or misleading information, despite being confidently presented. As many AI products use LLMs as the base, this issue can be especially problematic given some of the use cases its applied in e.g. price information being provided wrongly for a customer support product

It is our view that hallucination is one of the biggest roadblocks to shipping to production currently. We initially attempted a lot of prompt engineering to find out the most optimized way to create a solution with low to zero hallucinations. Unfortunately, prompt engineering did not yield many dividends.

In a nutshell, the reason hallucination is happening is mainly due to a lack of context in the LLM’s corpus on the information it’s trying to convey. No amount of prompt engineering will help in solving this. The contemporary solution today is retrieval augmented generation or RAG, an approach which attaches an outside source of updated information to the LLM along with a model specialized in retrieving that information. 

High-level RAG architecture

However, one way prompt engineering can assist here is the addition of this particular clause: “You will avoid speculation and not diverge into unrelated topics, consistently using the context information as the source of truth”. This ensures that the LLM doesn’t speculate on info that is not provided in the retrieved context.

Not Great for Adopting a Specific Behavior

Prompt engineering is also not great if you want the LLM to adopt a specific behavior e.g. give all answers in JSON/output all solutions in SQL. While it might be technically possible, you would be expending a lot of valuable time & cost finding the optimal prompt and wasting money on using tokens for the model.

Instead, what we found was that supervised fine-tuning is a far more efficient approach to doing this. We won’t go into too much detail for the purposes of this article, but you can find some resources on it here and here.

Learning Prompt Engineering: Should You Take a Course?

Given all that we just discussed, it seems that prompt engineering is a powerful tool in the early stages of building an AI product, but it’s not a cure-all. Whether you see merit in taking a course would very much depend on your reasons for learning prompt engineering, whether it be to build your own AI product or just looking to optimize the way you utilize LLMs.

If it’s the latter, there are many other options out there that might not necessarily mandate a full course. There are free resources available for those who want to get a 101 on basic optimization techniques for prompts. These resources would likely serve you well if the plan is just to learn how to optimize use of ChatGPT.

But what about if you’re using it to build an AI product? The best way we’ve found to learn prompt engineering for building is to try it yourself in your use case. It’s a powerful tool for experimentation and it helped us in the early stages to identify roadblocks in our AI product.

However, it’s our view that prompt engineering cannot stand on its own. As we shared earlier, it needs to be coupled with other approaches like RAG and fine-tuning to get an optimal solution that can be pushed into production. Besides, if your problem can easily be solved via prompt engineering, it could be quite likely that your unique selling point might not be that strong, and you could put yourself at risk for disruption by others.

Closing Thoughts: Will Prompt Engineering Become a Long-Term Job?

No, likely not. In the long term, we expect prompt engineering to become obsolete for 2 main reasons:

  1. Speed of development: In future generations, AI will get more intuitive and better at understanding natural language, negating the need for extensive prompt engineering
  2. Automation by AI: Research is already being done into how prompt engineering can be automated, ironically by AI itself

However, in the short term, we believe there will still be a need for prompt engineers as the above 2 reasons might take awhile to come to fruition. To future proof oneself, it would be good to learn other aspects of AI rather than just prompt engineering i.e. applying RAG, fine-tuning.

Prompt engineering is one small facet of building AI products. Here at SUPA, we’re leveraging our expertise from 7 years in the AI/ML industry to build solutions in CX. Check us out here.