promptbase

All things prompt engineering

5,641

317

5,641

View on GitHub

Top Related Projects

openai-cookbook

64,769

Examples and guides for using the OpenAI API

Prompt-Engineering-Guide

58,619

🐙 Guides, papers, lecture, notebooks and resources for prompt engineering

awesome-chatgpt-prompts

131,902

This repo includes ChatGPT prompt curation to use ChatGPT and other LLM tools better.

Awesome-Prompt-Engineering

4,642

This repository contains a hand-curated resources for Prompt Engineering with a focus on Generative Pre-trained Transformer (GPT), ChatGPT, PaLM etc

langchain

112,752

🦜🔗 Build context-aware reasoning applications

Quick Overview

PromptBase is a repository created by Microsoft that serves as a collection of high-quality prompts for large language models (LLMs) like GPT-3. The prompts cover a wide range of use cases, from creative writing to task-oriented applications, and are designed to help users get the most out of their LLM-powered projects.

Pros

Diverse Prompt Collection: PromptBase offers a vast and diverse collection of prompts, catering to various use cases and domains.
High-Quality Prompts: The prompts are carefully crafted by experts to ensure they are effective and produce high-quality outputs.
Ongoing Maintenance: The repository is actively maintained, with new prompts being added and existing ones being updated regularly.
Community Contribution: The project encourages community involvement, allowing users to contribute their own prompts and provide feedback.

Cons

Limited Customization: While the prompts are designed to be flexible, users may still need to do some additional customization to fit their specific use cases.
Dependency on LLMs: The effectiveness of the prompts is heavily dependent on the capabilities of the underlying LLM, which may vary across different models and versions.
Potential Bias: As with any AI-powered system, the prompts may reflect biases present in the training data or the model itself.
Intellectual Property Concerns: Users should be mindful of any potential intellectual property or licensing issues when using the prompts in their projects.

Getting Started

To get started with PromptBase, you can follow these steps:

Clone the repository to your local machine:

git clone https://github.com/microsoft/promptbase.git

Navigate to the cloned repository:

cd promptbase

Explore the available prompts in the prompts directory. Each prompt is stored in a separate file, with a brief description and instructions on how to use it.
Choose a prompt that fits your use case and copy the prompt text into your application or project.
Customize the prompt as needed, adjusting the parameters or adding additional context to suit your specific requirements.
Use the prompt with your LLM of choice to generate the desired output.
Provide feedback or contribute your own prompts to the repository by following the guidelines in the CONTRIBUTING.md file.

Competitor Comparisons

openai-cookbook

64,769

Examples and guides for using the OpenAI API

Pros of openai-cookbook

More comprehensive and diverse set of examples and use cases
Regularly updated with new features and API changes
Includes examples for various OpenAI models beyond just GPT

Cons of openai-cookbook

Focuses solely on OpenAI's offerings, limiting its applicability to other AI platforms
Less structured approach to organizing prompts and techniques

Code Comparison

openai-cookbook:

response = openai.Completion.create(
  model="text-davinci-002",
  prompt="Translate the following English text to French: '{}'",
  temperature=0.3,
  max_tokens=60
)

promptbase:

from promptbase import Prompt

translator = Prompt("translate_english_to_french")
french_text = translator.run(text="Hello, how are you?")

The openai-cookbook example directly uses the OpenAI API, while promptbase provides a higher-level abstraction for managing and executing prompts. promptbase offers a more structured approach to organizing and reusing prompts, which can be beneficial for larger projects or teams working with multiple prompts.

Prompt-Engineering-Guide

58,619

🐙 Guides, papers, lecture, notebooks and resources for prompt engineering

Pros of Prompt-Engineering-Guide

More comprehensive and structured content, covering a wide range of prompt engineering techniques and best practices
Regularly updated with new information and examples from the rapidly evolving field of AI and language models
Includes interactive notebooks and practical exercises for hands-on learning

Cons of Prompt-Engineering-Guide

Less focused on specific use cases or industries compared to Promptbase
May be overwhelming for beginners due to the extensive amount of information provided
Lacks a standardized format for prompt templates, which Promptbase offers

Code Comparison

Prompt-Engineering-Guide example:

prompt = f"""
Translate the following English text to French:
'{text}'
"""
response = get_completion(prompt)
print(response)

Promptbase example:

from promptbase import Prompt

translator = Prompt("translation")
result = translator.run(text="Hello, world!", target_language="French")
print(result)

Both repositories provide valuable resources for prompt engineering, but they cater to different needs. Prompt-Engineering-Guide offers a more comprehensive and educational approach, while Promptbase focuses on practical implementation with standardized templates for specific use cases.

awesome-chatgpt-prompts

131,902

This repo includes ChatGPT prompt curation to use ChatGPT and other LLM tools better.

Pros of awesome-chatgpt-prompts

Larger collection of prompts, covering a wide range of topics and use cases
Community-driven, with frequent updates and contributions from users
Includes prompts in multiple languages

Cons of awesome-chatgpt-prompts

Less structured organization compared to promptbase
May include lower quality or less vetted prompts due to open contributions
Lacks a standardized format for prompt descriptions

Code Comparison

awesome-chatgpt-prompts:

# Act as a Linux Terminal
I want you to act as a linux terminal. I will type commands and you will reply with what the terminal should show. I want you to only reply with the terminal output inside one unique code block, and nothing else. do not write explanations. do not type commands unless I instruct you to do so. when i need to tell you something in english, i will do so by putting text inside curly brackets {like this}. my first command is pwd

promptbase:

def get_prompt(input_text):
    return f"""You are an AI assistant named Claude. You are helpful, harmless, and honest.

Awesome-Prompt-Engineering

4,642

This repository contains a hand-curated resources for Prompt Engineering with a focus on Generative Pre-trained Transformer (GPT), ChatGPT, PaLM etc

Pros of Awesome-Prompt-Engineering

More comprehensive collection of prompt engineering resources
Regularly updated with community contributions
Includes practical examples and use cases

Cons of Awesome-Prompt-Engineering

Less structured organization compared to Promptbase
Lacks specific tools or frameworks for prompt development
May be overwhelming for beginners due to the large amount of information

Code Comparison

Promptbase example:

from promptbase import Prompt

prompt = Prompt("Translate the following English text to French: {text}")
result = prompt.run(text="Hello, world!")
print(result)

Awesome-Prompt-Engineering doesn't provide specific code examples, but rather focuses on curating resources and techniques for prompt engineering.

Summary

Awesome-Prompt-Engineering is a community-driven repository that offers a wide range of resources for prompt engineering, including articles, tutorials, and best practices. It's regularly updated but may lack the structure and specific tools provided by Promptbase. Promptbase, on the other hand, offers a more focused approach with a dedicated framework for prompt development, which may be more suitable for developers looking for a ready-to-use solution.

langchain

112,752

🦜🔗 Build context-aware reasoning applications

Pros of LangChain

More comprehensive framework for building LLM applications
Larger community and ecosystem with extensive documentation
Supports multiple LLM providers and integrations

Cons of LangChain

Steeper learning curve due to its extensive features
Can be overkill for simple prompt engineering tasks
Requires more setup and configuration

Code Comparison

PromptBase example:

from promptbase import PromptTemplate

template = PromptTemplate("Summarize the following text: {text}")
prompt = template.format(text="Long article content here...")

LangChain example:

from langchain import PromptTemplate

template = PromptTemplate(
    input_variables=["text"],
    template="Summarize the following text: {text}"
)
prompt = template.format(text="Long article content here...")

Both repositories aim to simplify working with language models, but LangChain offers a more comprehensive toolkit for building LLM-powered applications. PromptBase focuses primarily on prompt engineering and management, making it potentially easier to use for specific prompt-related tasks. LangChain's broader scope includes features like chains, agents, and memory, which can be advantageous for complex projects but may introduce unnecessary complexity for simpler use cases.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

promptbase

promptbase is an evolving collection of resources, best practices, and example scripts for eliciting the best performance from foundation models like GPT-4. We currently host scripts demonstrating the Medprompt methodology, including examples of how we further extended this collection of prompting techniques ("Medprompt+") into non-medical domains:

Benchmark	GPT-4 Prompt	GPT-4 Results	Gemini Ultra Results
MMLU	Medprompt+	90.10%	90.04%
GSM8K	Zero-shot	95.3%	94.4%
MATH	Zero-shot	68.4%	53.2%
HumanEval	Zero-shot	87.8%	74.4%
BIG-Bench-Hard	Few-shot + CoT	89.0%	83.6%
DROP	Zero-shot + CoT	83.7%	82.4%
HellaSwag	10-shot	95.3%	87.8%

In the near future, promptbase will also offer further case studies and structured interviews around the scientific process we take behind prompt engineering. We'll also offer specialized deep dives into specialized tooling that accentuates the prompt engineering process. Stay tuned!

`Medprompt` and The Power of Prompting

"Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine" (H. Nori, Y. T. Lee, S. Zhang, D. Carignan, R. Edgar, N. Fusi, N. King, J. Larson, Y. Li, W. Liu, R. Luo, S. M. McKinney, R. O. Ness, H. Poon, T. Qin, N. Usuyama, C. White, E. Horvitz 2023)

@article{nori2023can,
title={Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine},
author={Nori, Harsha and Lee, Yin Tat and Zhang, Sheng and Carignan, Dean and Edgar, Richard and Fusi, Nicolo and King, Nicholas and Larson, Jonathan and Li, Yuanzhi and Liu, Weishung and others},
journal={arXiv preprint arXiv:2311.16452},
year={2023}
}

Paper link

In a recent study, we showed how the composition of several prompting strategies into a method that we refer to as Medprompt can efficiently steer generalist models like GPT-4 to achieve top performance, even when compared to models specifically finetuned for medicine. Medprompt composes three distinct strategies together -- including dynamic few-shot selection, self-generated chain of thought, and choice-shuffle ensembling -- to elicit specialist level performance from GPT-4. We briefly describe these strategies here:

Dynamic Few Shots: Few-shot learning -- providing several examples of the task and response to a foundation model -- enables models quickly adapt to a specific domain and learn to follow the task format. For simplicity and efficiency, the few-shot examples applied in prompting for a particular task are typically fixed; they are unchanged across test examples. This necessitates that the few-shot examples selected are broadly representative and relevant to a wide distribution of text examples. One approach to meeting these requirements is to have domain experts carefully hand-craft exemplars. Even so, this approach cannot guarantee that the curated, fixed few-shot examples will be appropriately representative of every test example. However, with enough available data, we can select different few-shot examples for different task inputs. We refer to this approach as employing dynamic few-shot examples. The method makes use of a mechanism to identify examples based on their similarity to the case at hand. For Medprompt, we did the following to identify representative few shot examples: Given a test example, we choose k training examples that are semantically similar using a k-NN clustering in the embedding space. Specifically, we first use OpenAI's text-embedding-ada-002 model to embed candidate exemplars for few-shot learning. Then, for each test question x, we retrieve its nearest k neighbors x1, x2, ..., xk from the training set (according to distance in the embedding space of text-embedding-ada-002). These examples -- the ones most similar in embedding space to the test question -- are ultimately registered in the prompt.
Self-Generated Chain of Thought (CoT): Chain-of-thought (CoT) uses natural language statements, such as âLetâs think step by step,â to explicitly encourage the model to generate a series of intermediate reasoning steps. The approach has been found to significantly improve the ability of foundation models to perform complex reasoning. Most approaches to chain-of-thought center on the use of experts to manually compose few-shot examples with chains of thought for prompting. Rather than rely on human experts, we pursued a mechanism to automate the creation of chain-of-thought examples. We found that we could simply ask GPT-4 to generate chain-of-thought for the training examples, with appropriate guardrails for reducing risk of hallucination via incorrect reasoning chains.
Majority Vote Ensembling: Ensembling refers to combining the output of several algorithms together to yield better predictive performance than any individual algorithm. Frontier models like GPT-4 benefit from ensembling of their own outputs. A simple technique is to have a variety of prompts, or a single prompt with varied temperature, and report the most frequent answer amongst the ensemble constituents. For multiple choice questions, we employ a further trick that increases the diversity of the ensemble called choice-shuffling, where we shuffle the relative order of the answer choices before generating each reasoning path. We then select the most consistent answer, i.e., the one that is least sensitive to choice shuffling, which increases the robustness of the answer.

The combination of these three techniques led to breakthrough performance in Medprompt for medical challenge questions. Implementation details of these techniques can be found here: https://github.com/microsoft/promptbase/tree/main/src/promptbase/mmlu

`Medprompt+` | Extending the power of prompting

Here we provide some intuitive details on how we extended the medprompt prompting framework to elicit even stronger out-of-domain performance on the MMLU (Measuring Massive Multitask Language Understanding) benchmark. MMLU was established as a test of general knowledge and reasoning powers of large language models. The complete MMLU benchmark contains tens of thousands of challenge problems of different forms across 57 areas from basic mathematics to United States history, law, computer science, engineering, medicine, and more.

We found that applying Medprompt without modification to the whole MMLU achieved a score of 89.1%. Not bad for a single policy working across a great diversity of problems! But could we push Medprompt to do better? Simply scaling-up MedPrompt can yield further benefits. As a first step, we increased the number of ensembled calls from five to 20. This boosted performance to 89.56%.

On working to push further with refinement of Medprompt, we noticed that performance was relatively poor for specific topics of the MMLU. MMLU contains a great diversity of types of questions, depending on the discipline and specific benchmark at hand. How might we push GPT-4 to perform even better on MMLU given the diversity of problems?

We focused on extension to a portfolio approach based on the observation that some topical areas tend to ask questions that would require multiple steps of reasoning and perhaps a scratch pad to keep track of multiple parts of a solution. Other areas seek factual answers that follow more directly from questions. Medprompt employs âchain-of-thoughtâ (CoT) reasoning, resonating with multi-step solving. We wondered if the sophisticated Medprompt-classic approach might do less well on very simple questions and if the system might do better if a simpler method were used for the factual queries.

Following this argument, we found that we could boost the performance on MMLU by extending MedPrompt with a simple two-method prompt portfolio. We add to the classic Medprompt a set of 10 simple, direct few-shot prompts soliciting an answer directly without Chain of Thought. We then ask GPT-4 for help with deciding on the best strategy for each topic area and question. As a screening call, for each question we first ask GPT-4:

# Question
{{ question }}
 
# Task
Does answering the question above require a scratch-pad?
A. Yes
B. No

If GPT-4 thinks the question does require a scratch-pad, then the contribution of the Chain-of-Thought component of the ensemble is doubled. If it doesn't, we halve that contribution (and let the ensemble instead depend more on the direct few-shot prompts). Dynamically leveraging the appropriate prompting technique in the ensemble led to a further +0.5% performance improvement across the MMLU.

We note that Medprompt+ relies on accessing confidence scores (logprobs) from GPT-4. These are not publicly available via the current API but will be enabled for all in the near future.

Running Scripts

Note: Some scripts hosted here are published for reference on methodology, but may not be immediately executable against public APIs. We're working hard on making the pipelines easier to run "out of the box" over the next few days, and appreciate your patience in the interim!

First, clone the repo and install the promptbase package:

cd src
pip install -e .

Next, decide which tests you'd like to run. You can choose from:

bigbench
drop
gsm8k
humaneval
math
mmlu

Before running the tests, you will need to download the datasets from the original sources (see below) and place them in the src/promptbase/datasets directory.

After downloading datasets and installing the promptbase package, you can run a test with:

python -m promptbase dataset_name

For example:

python -m promptbase gsm8k

Dataset Links

To run evaluations, download these datasets and add them to /src/promptbase/datasets/

MMLU: https://github.com/hendrycks/test
- Download the data.tar file from the above page
- Extract the contents
- Run mkdir src/promptbase/datasets/mmlu
- Run python ./src/promptbase/format/format_mmlu.py --mmlu_csv_dir /path/to/extracted/csv/files --output_path ./src/promptbase/datasets/mmlu
- You will also need to set the following environment variables:
  - AZURE_OPENAI_API_KEY
  - AZURE_OPENAI_CHAT_API_KEY
  - AZURE_OPENAI_CHAT_ENDPOINT_URL
  - AZURE_OPENAI_EMBEDDINGS_URL
- Run with python -m promptbase mmlu --subject <SUBJECT> where <SUBJECT> is one of the MMLU datasets (such as 'abstract_algebra')
- In addition to the individual subjects, the format_mmlu.py script prepares files which enables all to be passed as a subject, which will run on the entire dataset
HumanEval: https://huggingface.co/datasets/openai_humaneval
DROP: https://allenai.org/data/drop
GSM8K: https://github.com/openai/grade-school-math
MATH: https://huggingface.co/datasets/hendrycks/competition_math
Big-Bench-Hard: https://github.com/suzgunmirac/BIG-Bench-Hard The contents of this repo need to be put into a directory called BigBench in the datasets directory

Other Resources:

Medprompt Blog: https://www.microsoft.com/en-us/research/blog/the-power-of-prompting/

Medprompt Research Paper: https://arxiv.org/abs/2311.16452

Medprompt+: https://www.microsoft.com/en-us/research/blog/steering-at-the-frontier-extending-the-power-of-prompting/

Microsoft Introduction to Prompt Engineering: https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/prompt-engineering

Microsoft Advanced Prompt Engineering Guide: https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/advanced-prompt-engineering?pivots=programming-language-chat-completions

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Getting Started

Competitor Comparisons

Pros of openai-cookbook

Cons of openai-cookbook

Code Comparison

Pros of Prompt-Engineering-Guide

Cons of Prompt-Engineering-Guide

Code Comparison

Pros of awesome-chatgpt-prompts

Cons of awesome-chatgpt-prompts

Code Comparison

Pros of Awesome-Prompt-Engineering

Cons of Awesome-Prompt-Engineering

Code Comparison

Summary

Pros of LangChain

Cons of LangChain

Code Comparison

Convert designs to code with AI

README

promptbase

Medprompt and The Power of Prompting

Medprompt+ | Extending the power of prompting

Running Scripts

Dataset Links

Other Resources:

Top Related Projects

Convert designs to code with AI

`Medprompt` and The Power of Prompting

`Medprompt+` | Extending the power of prompting