Top Related Projects
Examples and guides for using the OpenAI API
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
This repo includes ChatGPT prompt curation to use ChatGPT better.
This repository contains a hand-curated resources for Prompt Engineering with a focus on Generative Pre-trained Transformer (GPT), ChatGPT, PaLM etc
🦜🔗 Build context-aware reasoning applications
Quick Overview
PromptBase is a repository created by Microsoft that serves as a collection of high-quality prompts for large language models (LLMs) like GPT-3. The prompts cover a wide range of use cases, from creative writing to task-oriented applications, and are designed to help users get the most out of their LLM-powered projects.
Pros
- Diverse Prompt Collection: PromptBase offers a vast and diverse collection of prompts, catering to various use cases and domains.
- High-Quality Prompts: The prompts are carefully crafted by experts to ensure they are effective and produce high-quality outputs.
- Ongoing Maintenance: The repository is actively maintained, with new prompts being added and existing ones being updated regularly.
- Community Contribution: The project encourages community involvement, allowing users to contribute their own prompts and provide feedback.
Cons
- Limited Customization: While the prompts are designed to be flexible, users may still need to do some additional customization to fit their specific use cases.
- Dependency on LLMs: The effectiveness of the prompts is heavily dependent on the capabilities of the underlying LLM, which may vary across different models and versions.
- Potential Bias: As with any AI-powered system, the prompts may reflect biases present in the training data or the model itself.
- Intellectual Property Concerns: Users should be mindful of any potential intellectual property or licensing issues when using the prompts in their projects.
Getting Started
To get started with PromptBase, you can follow these steps:
- Clone the repository to your local machine:
git clone https://github.com/microsoft/promptbase.git
- Navigate to the cloned repository:
cd promptbase
-
Explore the available prompts in the
prompts
directory. Each prompt is stored in a separate file, with a brief description and instructions on how to use it. -
Choose a prompt that fits your use case and copy the prompt text into your application or project.
-
Customize the prompt as needed, adjusting the parameters or adding additional context to suit your specific requirements.
-
Use the prompt with your LLM of choice to generate the desired output.
-
Provide feedback or contribute your own prompts to the repository by following the guidelines in the
CONTRIBUTING.md
file.
Competitor Comparisons
Examples and guides for using the OpenAI API
Pros of openai-cookbook
- More comprehensive and diverse set of examples and use cases
- Regularly updated with new features and API changes
- Includes examples for various OpenAI models beyond just GPT
Cons of openai-cookbook
- Focuses solely on OpenAI's offerings, limiting its applicability to other AI platforms
- Less structured approach to organizing prompts and techniques
Code Comparison
openai-cookbook:
response = openai.Completion.create(
model="text-davinci-002",
prompt="Translate the following English text to French: '{}'",
temperature=0.3,
max_tokens=60
)
promptbase:
from promptbase import Prompt
translator = Prompt("translate_english_to_french")
french_text = translator.run(text="Hello, how are you?")
The openai-cookbook example directly uses the OpenAI API, while promptbase provides a higher-level abstraction for managing and executing prompts. promptbase offers a more structured approach to organizing and reusing prompts, which can be beneficial for larger projects or teams working with multiple prompts.
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
Pros of Prompt-Engineering-Guide
- More comprehensive and structured content, covering a wide range of prompt engineering techniques and best practices
- Regularly updated with new information and examples from the rapidly evolving field of AI and language models
- Includes interactive notebooks and practical exercises for hands-on learning
Cons of Prompt-Engineering-Guide
- Less focused on specific use cases or industries compared to Promptbase
- May be overwhelming for beginners due to the extensive amount of information provided
- Lacks a standardized format for prompt templates, which Promptbase offers
Code Comparison
Prompt-Engineering-Guide example:
prompt = f"""
Translate the following English text to French:
'{text}'
"""
response = get_completion(prompt)
print(response)
Promptbase example:
from promptbase import Prompt
translator = Prompt("translation")
result = translator.run(text="Hello, world!", target_language="French")
print(result)
Both repositories provide valuable resources for prompt engineering, but they cater to different needs. Prompt-Engineering-Guide offers a more comprehensive and educational approach, while Promptbase focuses on practical implementation with standardized templates for specific use cases.
This repo includes ChatGPT prompt curation to use ChatGPT better.
Pros of awesome-chatgpt-prompts
- Larger collection of prompts, covering a wide range of topics and use cases
- Community-driven, with frequent updates and contributions from users
- Includes prompts in multiple languages
Cons of awesome-chatgpt-prompts
- Less structured organization compared to promptbase
- May include lower quality or less vetted prompts due to open contributions
- Lacks a standardized format for prompt descriptions
Code Comparison
awesome-chatgpt-prompts:
# Act as a Linux Terminal
I want you to act as a linux terminal. I will type commands and you will reply with what the terminal should show. I want you to only reply with the terminal output inside one unique code block, and nothing else. do not write explanations. do not type commands unless I instruct you to do so. when i need to tell you something in english, i will do so by putting text inside curly brackets {like this}. my first command is pwd
promptbase:
def get_prompt(input_text):
return f"""You are an AI assistant named Claude. You are helpful, harmless, and honest.
This repository contains a hand-curated resources for Prompt Engineering with a focus on Generative Pre-trained Transformer (GPT), ChatGPT, PaLM etc
Pros of Awesome-Prompt-Engineering
- More comprehensive collection of prompt engineering resources
- Regularly updated with community contributions
- Includes practical examples and use cases
Cons of Awesome-Prompt-Engineering
- Less structured organization compared to Promptbase
- Lacks specific tools or frameworks for prompt development
- May be overwhelming for beginners due to the large amount of information
Code Comparison
Promptbase example:
from promptbase import Prompt
prompt = Prompt("Translate the following English text to French: {text}")
result = prompt.run(text="Hello, world!")
print(result)
Awesome-Prompt-Engineering doesn't provide specific code examples, but rather focuses on curating resources and techniques for prompt engineering.
Summary
Awesome-Prompt-Engineering is a community-driven repository that offers a wide range of resources for prompt engineering, including articles, tutorials, and best practices. It's regularly updated but may lack the structure and specific tools provided by Promptbase. Promptbase, on the other hand, offers a more focused approach with a dedicated framework for prompt development, which may be more suitable for developers looking for a ready-to-use solution.
🦜🔗 Build context-aware reasoning applications
Pros of LangChain
- More comprehensive framework for building LLM applications
- Larger community and ecosystem with extensive documentation
- Supports multiple LLM providers and integrations
Cons of LangChain
- Steeper learning curve due to its extensive features
- Can be overkill for simple prompt engineering tasks
- Requires more setup and configuration
Code Comparison
PromptBase example:
from promptbase import PromptTemplate
template = PromptTemplate("Summarize the following text: {text}")
prompt = template.format(text="Long article content here...")
LangChain example:
from langchain import PromptTemplate
template = PromptTemplate(
input_variables=["text"],
template="Summarize the following text: {text}"
)
prompt = template.format(text="Long article content here...")
Both repositories aim to simplify working with language models, but LangChain offers a more comprehensive toolkit for building LLM-powered applications. PromptBase focuses primarily on prompt engineering and management, making it potentially easier to use for specific prompt-related tasks. LangChain's broader scope includes features like chains, agents, and memory, which can be advantageous for complex projects but may introduce unnecessary complexity for simpler use cases.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
promptbase
promptbase
is an evolving collection of resources, best practices, and example scripts for eliciting the best performance from foundation models like GPT-4
. We currently host scripts demonstrating the Medprompt
methodology, including examples of how we further extended this collection of prompting techniques ("Medprompt+
") into non-medical domains:
Benchmark | GPT-4 Prompt | GPT-4 Results | Gemini Ultra Results |
---|---|---|---|
MMLU | Medprompt+ | 90.10% | 90.04% |
GSM8K | Zero-shot | 95.3% | 94.4% |
MATH | Zero-shot | 68.4% | 53.2% |
HumanEval | Zero-shot | 87.8% | 74.4% |
BIG-Bench-Hard | Few-shot + CoT | 89.0% | 83.6% |
DROP | Zero-shot + CoT | 83.7% | 82.4% |
HellaSwag | 10-shot | 95.3% | 87.8% |
In the near future, promptbase
will also offer further case studies and structured interviews around the scientific process we take behind prompt engineering. We'll also offer specialized deep dives into specialized tooling that accentuates the prompt engineering process. Stay tuned!
Medprompt
and The Power of Prompting
"Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine" (H. Nori, Y. T. Lee, S. Zhang, D. Carignan, R. Edgar, N. Fusi, N. King, J. Larson, Y. Li, W. Liu, R. Luo, S. M. McKinney, R. O. Ness, H. Poon, T. Qin, N. Usuyama, C. White, E. Horvitz 2023)
Paper link@article{nori2023can, title={Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine}, author={Nori, Harsha and Lee, Yin Tat and Zhang, Sheng and Carignan, Dean and Edgar, Richard and Fusi, Nicolo and King, Nicholas and Larson, Jonathan and Li, Yuanzhi and Liu, Weishung and others}, journal={arXiv preprint arXiv:2311.16452}, year={2023} }
In a recent study, we showed how the composition of several prompting strategies into a method that we refer to as Medprompt
can efficiently steer generalist models like GPT-4 to achieve top performance, even when compared to models specifically finetuned for medicine. Medprompt
composes three distinct strategies together -- including dynamic few-shot selection, self-generated chain of thought, and choice-shuffle ensembling -- to elicit specialist level performance from GPT-4. We briefly describe these strategies here:
-
Dynamic Few Shots: Few-shot learning -- providing several examples of the task and response to a foundation model -- enables models quickly adapt to a specific domain and learn to follow the task format. For simplicity and efficiency, the few-shot examples applied in prompting for a particular task are typically fixed; they are unchanged across test examples. This necessitates that the few-shot examples selected are broadly representative and relevant to a wide distribution of text examples. One approach to meeting these requirements is to have domain experts carefully hand-craft exemplars. Even so, this approach cannot guarantee that the curated, fixed few-shot examples will be appropriately representative of every test example. However, with enough available data, we can select different few-shot examples for different task inputs. We refer to this approach as employing dynamic few-shot examples. The method makes use of a mechanism to identify examples based on their similarity to the case at hand. For Medprompt, we did the following to identify representative few shot examples: Given a test example, we choose k training examples that are semantically similar using a k-NN clustering in the embedding space. Specifically, we first use OpenAI's
text-embedding-ada-002
model to embed candidate exemplars for few-shot learning. Then, for each test question x, we retrieve its nearest k neighbors x1, x2, ..., xk from the training set (according to distance in the embedding space of text-embedding-ada-002). These examples -- the ones most similar in embedding space to the test question -- are ultimately registered in the prompt. -
Self-Generated Chain of Thought (CoT): Chain-of-thought (CoT) uses natural language statements, such as âLetâs think step by step,â to explicitly encourage the model to generate a series of intermediate reasoning steps. The approach has been found to significantly improve the ability of foundation models to perform complex reasoning. Most approaches to chain-of-thought center on the use of experts to manually compose few-shot examples with chains of thought for prompting. Rather than rely on human experts, we pursued a mechanism to automate the creation of chain-of-thought examples. We found that we could simply ask GPT-4 to generate chain-of-thought for the training examples, with appropriate guardrails for reducing risk of hallucination via incorrect reasoning chains.
-
Majority Vote Ensembling: Ensembling refers to combining the output of several algorithms together to yield better predictive performance than any individual algorithm. Frontier models like
GPT-4
benefit from ensembling of their own outputs. A simple technique is to have a variety of prompts, or a single prompt with variedtemperature
, and report the most frequent answer amongst the ensemble constituents. For multiple choice questions, we employ a further trick that increases the diversity of the ensemble calledchoice-shuffling
, where we shuffle the relative order of the answer choices before generating each reasoning path. We then select the most consistent answer, i.e., the one that is least sensitive to choice shuffling, which increases the robustness of the answer.
The combination of these three techniques led to breakthrough performance in Medprompt for medical challenge questions. Implementation details of these techniques can be found here: https://github.com/microsoft/promptbase/tree/main/src/promptbase/mmlu
Medprompt+
| Extending the power of prompting
Here we provide some intuitive details on how we extended the medprompt
prompting framework to elicit even stronger out-of-domain performance on the MMLU (Measuring Massive Multitask Language Understanding) benchmark. MMLU was established as a test of general knowledge and reasoning powers of large language models. The complete MMLU benchmark contains tens of thousands of challenge problems of different forms across 57 areas from basic mathematics to United States history, law, computer science, engineering, medicine, and more.
We found that applying Medprompt without modification to the whole MMLU achieved a score of 89.1%. Not bad for a single policy working across a great diversity of problems! But could we push Medprompt to do better? Simply scaling-up MedPrompt can yield further benefits. As a first step, we increased the number of ensembled calls from five to 20. This boosted performance to 89.56%.
On working to push further with refinement of Medprompt, we noticed that performance was relatively poor for specific topics of the MMLU. MMLU contains a great diversity of types of questions, depending on the discipline and specific benchmark at hand. How might we push GPT-4 to perform even better on MMLU given the diversity of problems?
We focused on extension to a portfolio approach based on the observation that some topical areas tend to ask questions that would require multiple steps of reasoning and perhaps a scratch pad to keep track of multiple parts of a solution. Other areas seek factual answers that follow more directly from questions. Medprompt employs âchain-of-thoughtâ (CoT) reasoning, resonating with multi-step solving. We wondered if the sophisticated Medprompt-classic approach might do less well on very simple questions and if the system might do better if a simpler method were used for the factual queries.
Following this argument, we found that we could boost the performance on MMLU by extending MedPrompt with a simple two-method prompt portfolio. We add to the classic Medprompt a set of 10 simple, direct few-shot prompts soliciting an answer directly without Chain of Thought. We then ask GPT-4 for help with deciding on the best strategy for each topic area and question. As a screening call, for each question we first ask GPT-4:
# Question
{{ question }}
# Task
Does answering the question above require a scratch-pad?
A. Yes
B. No
If GPT-4 thinks the question does require a scratch-pad, then the contribution of the Chain-of-Thought component of the ensemble is doubled. If it doesn't, we halve that contribution (and let the ensemble instead depend more on the direct few-shot prompts). Dynamically leveraging the appropriate prompting technique in the ensemble led to a further +0.5% performance improvement across the MMLU.
We note that Medprompt+ relies on accessing confidence scores (logprobs) from GPT-4. These are not publicly available via the current API but will be enabled for all in the near future.
Running Scripts
Note: Some scripts hosted here are published for reference on methodology, but may not be immediately executable against public APIs. We're working hard on making the pipelines easier to run "out of the box" over the next few days, and appreciate your patience in the interim!
First, clone the repo and install the promptbase package:
cd src
pip install -e .
Next, decide which tests you'd like to run. You can choose from:
- bigbench
- drop
- gsm8k
- humaneval
- math
- mmlu
Before running the tests, you will need to download the datasets from the original sources (see below) and place them in the src/promptbase/datasets
directory.
After downloading datasets and installing the promptbase package, you can run a test with:
python -m promptbase dataset_name
For example:
python -m promptbase gsm8k
Dataset Links
To run evaluations, download these datasets and add them to /src/promptbase/datasets/
- MMLU: https://github.com/hendrycks/test
- Download the
data.tar
file from the above page - Extract the contents
- Run
mkdir src/promptbase/datasets/mmlu
- Run
python ./src/promptbase/format/format_mmlu.py --mmlu_csv_dir /path/to/extracted/csv/files --output_path ./src/promptbase/datasets/mmlu
- You will also need to set the following environment variables:
AZURE_OPENAI_API_KEY
AZURE_OPENAI_CHAT_API_KEY
AZURE_OPENAI_CHAT_ENDPOINT_URL
AZURE_OPENAI_EMBEDDINGS_URL
- Run with
python -m promptbase mmlu --subject <SUBJECT>
where<SUBJECT>
is one of the MMLU datasets (such as 'abstract_algebra') - In addition to the individual subjects, the
format_mmlu.py
script prepares files which enablesall
to be passed as a subject, which will run on the entire dataset
- Download the
- HumanEval: https://huggingface.co/datasets/openai_humaneval
- DROP: https://allenai.org/data/drop
- GSM8K: https://github.com/openai/grade-school-math
- MATH: https://huggingface.co/datasets/hendrycks/competition_math
- Big-Bench-Hard: https://github.com/suzgunmirac/BIG-Bench-Hard
The contents of this repo need to be put into a directory called
BigBench
in thedatasets
directory
Other Resources:
Medprompt Blog: https://www.microsoft.com/en-us/research/blog/the-power-of-prompting/
Medprompt Research Paper: https://arxiv.org/abs/2311.16452
Medprompt+: https://www.microsoft.com/en-us/research/blog/steering-at-the-frontier-extending-the-power-of-prompting/
Microsoft Introduction to Prompt Engineering: https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/prompt-engineering
Microsoft Advanced Prompt Engineering Guide: https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/advanced-prompt-engineering?pivots=programming-language-chat-completions
Top Related Projects
Examples and guides for using the OpenAI API
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
This repo includes ChatGPT prompt curation to use ChatGPT better.
This repository contains a hand-curated resources for Prompt Engineering with a focus on Generative Pre-trained Transformer (GPT), ChatGPT, PaLM etc
🦜🔗 Build context-aware reasoning applications
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot