Top Related Projects
A guidance language for controlling large language models.
The official Python library for the OpenAI API
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
🦜🔗 Build context-aware reasoning applications
Integrate cutting-edge LLM technology quickly and easily into your apps
Quick Overview
The guidance-ai/guidance
repository is a Python library that provides a set of tools and utilities for building and deploying large language models (LLMs) and other AI-powered applications. The library aims to simplify the process of working with LLMs, offering a high-level API for tasks such as text generation, question answering, and more.
Pros
- Flexible and Extensible: The library is designed to be highly modular and extensible, allowing developers to easily integrate it into their own projects and customize it to their specific needs.
- Supports Multiple LLM Backends: The library supports a variety of LLM backends, including OpenAI's GPT-3, Anthropic's InstructGPT, and Hugging Face's Transformers, making it easy to experiment with different models.
- Comprehensive Documentation: The project has detailed documentation that covers a wide range of topics, from installation and setup to advanced usage and deployment.
- Active Development and Community: The project is actively maintained and has a growing community of contributors, ensuring that it continues to evolve and improve over time.
Cons
- Steep Learning Curve: The library's flexibility and feature-richness can make it challenging for beginners to get started, especially if they're new to working with LLMs.
- Dependency on External LLM Providers: The library relies on external LLM providers, which can introduce additional costs and potential availability issues.
- Limited Support for Specialized Hardware: While the library supports a variety of LLM backends, it may not provide optimal performance on specialized hardware, such as GPUs or TPUs.
- Potential Performance Overhead: The abstraction layer provided by the library may introduce some performance overhead compared to directly using the underlying LLM APIs.
Code Examples
Here are a few examples of how to use the guidance-ai/guidance
library:
- Text Generation:
from guidance import Guidance
# Initialize the Guidance instance
g = Guidance()
# Generate text using the default language model
text = g.generate("The quick brown fox jumps over the lazy dog.")
print(text)
- Question Answering:
from guidance import Guidance
# Initialize the Guidance instance
g = Guidance()
# Answer a question based on a given context
context = "The Eiffel Tower is a wrought-iron lattice tower built in 1889 in Paris, France."
question = "What is the Eiffel Tower?"
answer = g.answer(question, context)
print(answer)
- Summarization:
from guidance import Guidance
# Initialize the Guidance instance
g = Guidance()
# Summarize a given text
text = "This is a long and detailed text that needs to be summarized. It covers a wide range of topics, including history, science, and current events. The goal is to extract the key points and present them in a concise and easy-to-understand format."
summary = g.summarize(text)
print(summary)
- Sentiment Analysis:
from guidance import Guidance
# Initialize the Guidance instance
g = Guidance()
# Analyze the sentiment of a given text
text = "I really enjoyed the movie. It was well-written and the acting was superb."
sentiment = g.analyze_sentiment(text)
print(sentiment)
Getting Started
To get started with the guidance-ai/guidance
library, follow these steps:
- Install the library using pip:
pip install guidance
- Import the
Guidance
class and create an instance:
from guidance import Guidance
g = Guidance()
- Use the various methods provided by the
Guidance
class to interact with the LLM backend of your choice. For example, to generate text:
text = g.generate("The quick brown fox jumps over the lazy dog.")
print(text)
- Refer to the project's documentation for more detailed information on the available features, configuration options, and advanced usage.
Competitor Comparisons
A guidance language for controlling large language models.
Pros of guidance
- More actively maintained with recent updates
- Larger community with more stars and contributors
- Extensive documentation and examples
Cons of guidance
- Potentially more complex API due to additional features
- May have a steeper learning curve for beginners
- Larger codebase could lead to longer load times
Code Comparison
guidance:
import guidance
prompt = guidance('''
Human: What is the capital of France?
AI: The capital of France is Paris.
Human: What is the population of Paris?
AI: ''')
result = prompt()
print(result)
guidance>:
from guidance import guidance
@guidance
def conversation(human_input):
ai_response = yield from guidance.complete(human_input)
return ai_response
result = conversation("What is the capital of France?")
print(result)
Both repositories provide similar functionality for generating AI responses, but guidance offers a more flexible and feature-rich approach. The guidance> repository appears to be a simplified or earlier version of the guidance project, with less active development and fewer features. Users looking for a more comprehensive and up-to-date solution may prefer guidance, while those seeking a simpler implementation might find guidance> sufficient for basic needs.
The official Python library for the OpenAI API
Pros of openai-python
- Official OpenAI library, ensuring direct compatibility and up-to-date features
- Comprehensive support for all OpenAI API endpoints and models
- Well-documented with extensive examples and community support
Cons of openai-python
- Limited to OpenAI's services, lacking flexibility for other AI providers
- Requires more boilerplate code for complex prompts and chained operations
- Less focus on prompt engineering and advanced text generation techniques
Code Comparison
openai-python:
import openai
openai.api_key = "your-api-key"
response = openai.Completion.create(
engine="text-davinci-002",
prompt="Translate the following English text to French: 'Hello, world!'",
max_tokens=60
)
guidance:
import guidance
prompt = guidance('''
Human: Translate the following English text to French: 'Hello, world!'
AI: Here's the translation:
{{gen 'translation' max_tokens=20}}
''')
executed = prompt()
print(executed['translation'])
The guidance library offers a more intuitive and flexible approach to prompt engineering, allowing for easier creation of complex, multi-step prompts. However, openai-python provides direct access to OpenAI's models and services, making it the go-to choice for straightforward OpenAI API interactions.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Pros of transformers
- Extensive model support: Covers a wide range of transformer-based models
- Rich documentation and community support
- Seamless integration with PyTorch and TensorFlow
Cons of transformers
- Steeper learning curve for beginners
- Can be resource-intensive for large models
- Less focused on prompt engineering and controlled generation
Code Comparison
transformers:
from transformers import pipeline
generator = pipeline('text-generation', model='gpt2')
result = generator("Hello, I'm a language model,", max_length=30)
print(result[0]['generated_text'])
guidance:
import guidance
prompt = guidance('''
Human: Hello, I'm a language model,
AI: {{gen 'response' max_tokens=20}}
''')
result = prompt()
print(result['response'])
The transformers library provides a more traditional approach to model usage, while guidance focuses on prompt engineering and controlled text generation. guidance offers a more intuitive interface for prompt design and fine-grained control over the generation process, making it easier to create complex prompts and manage model outputs.
🦜🔗 Build context-aware reasoning applications
Pros of LangChain
- More extensive ecosystem with a wider range of integrations and tools
- Stronger community support and more frequent updates
- Better documentation and learning resources
Cons of LangChain
- Can be more complex and overwhelming for beginners
- Potentially slower execution due to its comprehensive nature
Code Comparison
LangChain:
from langchain import OpenAI, LLMChain, PromptTemplate
template = "What is a good name for a company that makes {product}?"
prompt = PromptTemplate(template=template, input_variables=["product"])
llm_chain = LLMChain(prompt=prompt, llm=OpenAI(temperature=0.9))
print(llm_chain.run("colorful socks"))
Guidance:
import guidance
prompt = guidance('''
Human: What is a good name for a company that makes {{product}}?
AI: Here's a suggestion for a company name that makes {{product}}:''')
print(prompt(product="colorful socks"))
Both repositories aim to simplify working with language models, but they have different approaches. LangChain offers a more comprehensive toolkit with various components and integrations, while Guidance focuses on a simpler, more direct approach to prompt engineering. The choice between them depends on the specific needs of your project and your familiarity with language model development.
Integrate cutting-edge LLM technology quickly and easily into your apps
Pros of Semantic Kernel
- More comprehensive framework with built-in memory, planning, and skills management
- Better integration with Azure services and Microsoft ecosystem
- Stronger community support and regular updates
Cons of Semantic Kernel
- Steeper learning curve due to more complex architecture
- Primarily focused on C# and .NET, limiting language options
- Heavier dependency on Microsoft technologies
Code Comparison
Guidance:
import guidance
prompt = guidance('''
Human: What is the capital of France?
AI: The capital of France is Paris.
Human: What is the population of Paris?
AI: As of 2023, the estimated population of Paris is approximately 2.2 million people in the city proper.
Human: What is a famous landmark in Paris?
AI: One of the most famous landmarks in Paris is the Eiffel Tower.
''')
result = prompt()
print(result)
Semantic Kernel:
using Microsoft.SemanticKernel;
var kernel = Kernel.Builder.Build();
var promptTemplate = "What is the capital of {{$country}}?";
var prompt = kernel.CreateSemanticFunction(promptTemplate);
var result = await prompt.InvokeAsync(new ContextVariables { ["country"] = "France" });
Console.WriteLine(result);
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Guidance is an efficient programming paradigm for steering language models. With Guidance, you can control how output is structured and get high-quality output for your use caseâwhile reducing latency and cost vs. conventional prompting or fine-tuning. It allows users to constrain generation (e.g. with regex and CFGs) as well as to interleave control (conditionals, loops, tool use) and generation seamlessly.
- Install
- Features
- Example notebooks
- Basic generation
- Constrained generation
- Stateful control + generation
Install
Guidance is available through PyPI and supports a variety of backends (Transformers, llama.cpp, OpenAI, etc.). To use a specific model see loading models.
pip install guidance
Note: To use Guidance on Phi models in Azure AI, or to use the new accelerated Rust-based parser, please install the release-candidate v0.2.0 guidance package:
pip install guidance --pre
For a detailed walkthrough of using Guidance on hosted Phi models, check the Azure AI specific loading instructions. and the Phi-3 + Guidance cookbook.
Features
Write pure Python, with additional LM functionality.
from guidance import models, gen
# load a model (could be Transformers, LlamaCpp, VertexAI, OpenAI...)
llama2 = models.LlamaCpp(path)
# append text or generations to the model
llama2 + f'Do you want a joke or a poem? ' + gen(stop='.')
Constrain generation with selects (i.e., sets of options), regular expressions, and context-free grammars, as well as with pre-built components (e.g., substring, json).
from guidance import select
# a simple select between two options
llama2 + f'Do you want a joke or a poem? A ' + select(['joke', 'poem'])
Call and deploy tools easily with automatic interleaving of control and generation.
Easy tool use, where the model stops generation when a tool is called, calls the tool, then resumes generation. For example, here is a simple version of a calculator, via four separate 'tools':
@guidance
def add(lm, input1, input2):
lm += f' = {int(input1) + int(input2)}'
return lm
@guidance
def subtract(lm, input1, input2):
lm += f' = {int(input1) - int(input2)}'
return lm
@guidance
def multiply(lm, input1, input2):
lm += f' = {float(input1) * float(input2)}'
return lm
@guidance
def divide(lm, input1, input2):
lm += f' = {float(input1) / float(input2)}'
return lm
Now we call gen
with these tools as options. Notice how generation is stopped and restarted automatically:
lm = llama2 + '''\
1 + 1 = add(1, 1) = 2
2 - 3 = subtract(2, 3) = -1
'''
lm + gen(max_tokens=15, tools=[add, subtract, multiply, divide])
Get high compatibilityâexecute a single Guidance program on many backends
Works with Transformers, llama.cpp, AzureAI, VertexAI, OpenAI and others. Users can write one guidance program and execute it on many backends. (note that the most powerful control features require endpoint integration, and for now work best with Transformers and llama.cpp).
gpt = models.OpenAI("gpt-3.5-turbo")
with user():
lm = gpt + "What is the capital of France?"
with assistant():
lm += gen("capital")
with user():
lm += "What is one short surprising fact about it?"
with assistant():
lm += gen("fact")
Gain speed with stateful control + generation functionsâno need for intermediate parsers.
In contrast to chaining, Guidance programs are the equivalent of a single LLM call. More so, whatever non-generated text that gets appended is batched, so that Guidance programs are faster than having the LM generate intermediate text when you have a set structure.
Token healing
Users deal with text (or bytes) rather than tokens, and thus don't have to worry about perverse token boundaries issues such as 'prompt ending in whitespace'.
Rich templates with f-strings.
llama2 + f'''\
Do you want a joke or a poem? A {select(['joke', 'poem'])}.
Okay, here is a one-liner: "{gen(stop='"')}"
'''
Abstract chat interface that uses correct special tokens for any chat model.
# capture our selection under the name 'answer'
lm = llama2 + f"Do you want a joke or a poem? A {select(['joke', 'poem'], name='answer')}.\n"
# make a choice based on the model's previous selection
if lm["answer"] == "joke":
lm += f"Here is a one-line joke about cats: " + gen('output', stop='\n')
else:
lm += f"Here is a one-line poem about dogs: " + gen('output', stop='\n')
Easy-to-write reusable components.
import guidance
@guidance
def one_line_thing(lm, thing, topic):
lm += f'Here is a one-line {thing} about {topic}: ' + gen(stop='\n')
return lm # return our updated model
# pick either a joke or a poem
lm = llama2 + f"Do you want a joke or a poem? A {select(['joke', 'poem'], name='thing')}.\n"
# call our guidance function
lm += one_line_thing(lm['thing'], 'cats')
A library of pre-built components
Common syntax elements are available out of the box, below is an example of substring
for others (like json
) checkout the docs.
from guidance import substring
# define a set of possible statements
text = 'guidance is awesome. guidance is so great. guidance is the best thing since sliced bread.'
# force the model to make an exact quote
llama2 + f'Here is a true statement about the guidance library: "{substring(text)}"'
Streaming support, also integrated with Jupyter notebooks.
lm = llama2 + 'Here is a cute 5-line poem about cats and dogs:\n'
for i in range(5):
lm += f"LINE {i+1}: " + gen(temperature=0.8, suffix="\n")
For environments that don't support guidance's rich IPython/Jupyter/HTML based visualizations (e.g. console applications), all visualizations and console outputs can be supressed by setting echo=False
in the constructor of any guidance.models
object:
llama2 = models.LlamaCpp(path, echo=False)
Multi-modal support.
from guidance import image
gemini = models.VertexAI("gemini-pro-vision")
with user():
lm = gemini + "What is this a picture of?" + image("longs_peak.jpg")
with assistant():
lm += gen("answer")
Example notebooks
We are working on updating our example notebooks. The following ones have been updated:
More coming soon
Basic generation
An lm
object is immutable, so you change it by creating new copies of it. By default, when you append things to lm
, it creates a copy, e.g.:
from guidance import models, gen, select
llama2 = models.LlamaCpp(model)
# llama2 is not modified, `lm` is a copy of `llama2` with 'This is a prompt' appended to its state
lm = llama2 + 'This is a prompt'
You can append generation calls to model objects, e.g.
lm = llama2 + 'This is a prompt' + gen(max_tokens=10)
You can also interleave generation calls with plain text, or control flows:
# Note how we set stop tokens
lm = llama2 + 'I like to play with my ' + gen(stop=' ') + ' in' + gen(stop=['\n', '.', '!'])
Constrained generation
Select (basic)
select
constrains generation to a set of options:
lm = llama2 + 'I like the color ' + select(['red', 'blue', 'green'])
Regular expressions
gen
has optional arguments regex
and stop_regex
, which allow generation (and stopping, respectively) to be controlled by a regex.
Regex to constrain generation
Unconstrained:
lm = llama2 + 'Question: Luke has ten balls. He gives three to his brother.\n'
lm += 'How many balls does he have left?\n'
lm += 'Answer: ' + gen(stop='\n')
Constrained by regex:
lm = llama2 + 'Question: Luke has ten balls. He gives three to his brother.\n'
lm += 'How many balls does he have left?\n'
lm += 'Answer: ' + gen(regex='\d+')
Regex as stopping criterion
Unconstrained:
lm = llama2 + '19, 18,' + gen(max_tokens=50)
Stop with traditional stop text, whenever the model generates the number 7:
lm = llama2 + '19, 18,' + gen(max_tokens=50, stop='7')
Stop whenever the model generates the character 7
without any numbers around it:
lm = llama2 + '19, 18,' + gen(max_tokens=50, stop_regex='[^\d]7[^\d]')
Context-free grammars
We expose a variety of operators that make it easy to define CFGs, which in turn can be used to constrain generation.
For example, we can use the select
operator (it accepts CFGs as options), zero_or_more
and one_or_more
to define a grammar for mathematical expressions:
import guidance
from guidance import one_or_more, select, zero_or_more
# stateless=True indicates this function does not depend on LLM generations
@guidance(stateless=True)
def number(lm):
n = one_or_more(select(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']))
# Allow for negative or positive numbers
return lm + select(['-' + n, n])
@guidance(stateless=True)
def operator(lm):
return lm + select(['+' , '*', '**', '/', '-'])
@guidance(stateless=True)
def expression(lm):
# Either
# 1. A number (terminal)
# 2. two expressions with an operator and optional whitespace
# 3. An expression with parentheses around it
return lm + select([
number(),
expression() + zero_or_more(' ') + operator() + zero_or_more(' ') + expression(),
'(' + expression() + ')'
])
The @guidance(stateless=True)
decorator makes it such that a function (e.g. expression
) lives as a stateless grammar that does not get 'executed' until we call lm + expression()
or lm += expression()
. For example, here is an example of unconstrained generation:
# Without constraints
lm = llama2 + 'Problem: Luke has a hundred and six balls. He then loses thirty six.\n'
lm += 'Equivalent arithmetic expression: ' + gen(stop='\n') + '\n'
Notice how the model wrote the right equation but solved it (incorrectly). If we wanted to constrain the model such that it only writes valid expressions (without trying to solve them), we can just append our grammar to it:
grammar = expression()
lm = llama2 + 'Problem: Luke has a hundred and six balls. He then loses thirty six.\n'
lm += 'Equivalent arithmetic expression: ' + grammar + '\n'
Grammars are very easy to compose. For example, let's say we want a grammar that generates either a mathematical expression or an expression followed by a solution followed by another expression. Creating this grammar is easy:
from guidance import regex
grammar = select([expression(), expression() + regex(' = \d+; ') + expression()])
We can generate according to it:
llama2 + 'Here is a math expression for two plus two: ' + grammar
llama2 + '2 + 2 = 4; 3+3\n' + grammar
Even if you don't like thinking in terms of recursive grammars, this formalism makes it easy to constrain generation. For example, let's say we have the following one-shot prompt:
@guidance(stateless=True)
def ner_instruction(lm, input):
lm += f'''\
Please tag each word in the input with PER, ORG, LOC, or nothing
---
Input: John worked at Apple.
Output:
John: PER
worked:
at:
Apple: ORG
.:
---
Input: {input}
Output:
'''
return lm
input = 'Julia never went to Morocco in her life!!'
llama2 + ner_instruction(input) + gen(stop='---')
Notice that the model did not spell the word 'Morocco' correctly. Sometimes the model might also hallucinate a tag that doesn't exist. We can improve this by adding more few-shot examples, etc, but we can also constrain generation to the exact format we want:
import re
@guidance(stateless=True)
def constrained_ner(lm, input):
# Split into words
words = [x for x in re.split('([^a-zA-Z0-9])', input) if x and not re.match('\s', x)]
ret = ''
for x in words:
ret += x + ': ' + select(['PER', 'ORG', 'LOC', '']) + '\n'
return lm + ret
llama2 + ner_instruction(input) + constrained_ner(input)
While constrained_ner(input)
is a grammar that constrains the model generation, it feels like you're just writing normal imperative python code with +=
and selects
.
Capture a generation
The string generated by a stateless function can be saved to the lm
object by using the capture
function. capture
takes two arguments: the stateless function and the name to store the captured variable.
from guidance import capture, one_or_more
ans = lm + "To close the open bracket sequence [[ the corresponding closing brackets are " + capture(one_or_more("]"), "brackets")
ans["brackets"]
Stateful control + generation
State in immutable objects
Whenever you do lm + grammar
or lm + gen
, lm + select
, etc, you return a new lm object with additional state. For example:
lm = llama2 + 'This is a prompt' + gen(name='test', max_tokens=10)
lm += select(['this', 'that'], name='test2')
lm['test'], lm['test2']
Stateful {guidance}
functions
The guidance decorator is @guidance(stateless=False)
by default, meaning that a function with this decorator depends on the lm state to execute (either prior state or state generated within the function). For example:
@guidance(stateless=False)
def test(lm):
lm += 'Should I say "Scott"?\n' + select(['yes', 'no'], name='answer') + '\n'
if lm['answer'] == 'yes':
lm += 'Scott'
else:
lm += 'Not Scott'
return lm
llama2 + test()
Example: ReAct
A big advantage of stateful control is that you don't have to write any intermediate parsers, and adding follow-up 'prompting' is easy, even if the follow up depends on what the model generates. For example, let's say we want to implement the first example of ReAct prompt in this, and let's say the valid acts are only 'Search' or 'Finish'. We might write it like this:
@guidance
def react_prompt_example(lm, question, max_rounds=10):
lm += f'Question: {question}\n'
i = 1
while True:
lm += f'Thought {i}: ' + gen(suffix='\n')
lm += f'Act {i}: ' + select(['Search', 'Finish'], name='act')
lm += '[' + gen(name='arg', suffix=']') + '\n'
if lm['act'] == 'Finish' or i == max_rounds:
break
else:
lm += f'Observation {i}: ' + search(lm['arg']) + '\n'
i += 1
return lm
Notice how we don't have to write a parser for Act and argument and hope that the model generates something valid: we enforce it. Notice also that the loop only stops once the model chooses to act with 'Finish' (or once we hit a maximum number of rounds).
Example: Changing intermediate step of a Chat session
We can also hide or change some of what the model generates. For example, below we get a Chat model (notice we use special role
blocks) to name some experts to answer a question, but we always remove 'Ferriss' from the list if he is mentioned:
from guidance import user, system, assistant
lm = llama2
query = 'How can I be more productive?'
with system():
lm += 'You are a helpful and terse assistant.'
with user():
lm += f'I want a response to the following question:\n{query}\n'
lm += 'Name 3 world-class experts (past or present) who would be great at answering this.'
with assistant():
temp_lm = lm
for i in range(1, 4):
# This regex only allows strings that look like names (where every word is capitalized)
# list_append appends the result to a list
temp_lm += f'{i}. ' + gen(regex='([A-Z][a-z]*\s*)+', suffix='\n',
name='experts', list_append=True)
experts = [x for x in temp_lm['experts'] if 'Ferriss' not in x]
# Notice that even if the model generates 'Ferriss' above,
# it doesn't get added to `lm`, only to `temp_lm`
lm += ', '.join(experts)
with user():
lm += 'Please answer the question as if these experts had collaborated in writing an anonymous answer.'
with assistant():
lm += gen(max_tokens=100)
Automatic interleaving of control and generation: tool use
Tool use is a common case of stateful control. To make it easy to do so, gen
calls take tools
as an optional argument, where each tool is defined by (1) a grammar that triggers its call and captures the arguments (if any), and (2) the actual tool call. Then, as generation unrolls, whenever the model generates something that matches the grammar of a tool call, it (1) stops generation, (2) calls the tool (which can append whatever it wants to the LM session), and (3) continues generation.
For example, here is how we might implement a calculator tool, leveraging our expression
grammar above:
from guidance import capture, Tool
@guidance(stateless=True)
def calculator_call(lm):
# capture just 'names' the expression, to be saved in the LM state
return lm + 'calculator(' + capture(expression(), 'tool_args') + ')'
@guidance
def calculator(lm):
expression = lm['tool_args']
# You typically don't want to run eval directly for save reasons
# Here we are guaranteed to only have mathematical expressions
lm += f' = {eval(expression)}'
return lm
calculator_tool = Tool(calculator_call(), calculator)
lm = llama2 + 'Here are five expressions:\ncalculator(3 *3) = 33\ncalculator(2 + 1 * 3) = 5\n'
lm += gen(max_tokens=30, tools=[calculator_tool], stop='\n\n')
Gsm8k example
Notice that the calculator is just called seamlessly during generation. Here is a more realistic exampe of the model solving a gsm8k question:
@guidance
def math_with_calc(lm, question):
# Two-shot example
lm += '''\
Question: John starts with 2 balls. He then quintupled his number of balls. Then he lost half of them. He then gave 3 to his brother. How many does he have left?
Reasoning:
1. He quintupled his balls. So he has calculator(2 * 5) = 10 balls.
1. He lost half. So he has calculator(10 / 2) = 5 balls.
3. He gave 3 to his brother. So he has calculator(5 - 3) = 2 balls.
Answer: 2
Question: Jill get 7 dollars a day in allowance. She uses 1 each day to by a bus pass, then gives half away. How much does she have left each day?
Reasoning:
1. She gets 7 dollars a day.
1. She spends 1 on a bus pass. So she has calculator(5 - 1) = 6.
3. She gives half away. So that makes calculator(6 / 2) = 3.
Answer: 3
'''
lm += f'Question: {question}\n'
lm += 'Reasoning:\n' + gen(max_tokens=200, tools=[calculator_tool], stop='Answer')
# Only numbers or commas
lm += 'Answer: ' + gen(regex='[-\d,]+')
return lm
question = '''Janetâs ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?'''
llama2 + math_with_calc(question)
Automatic call grammar for @guidance functions
You can also initialize a Tool
with any @guidance
-decorated function, and the default call grammar will be like a python call. Here is an example of using multiple such tools in the same gen
call:
@guidance
def say_scott(lm, n):
lm += '\n'
for _ in range(int(n)):
lm += 'Scott\n'
return lm
@guidance
def say_marco(lm, n):
lm += '\n'
for _ in range(int(n)):
lm += 'marco\n'
return lm
tools = [Tool(callable=say_scott), Tool(callable=say_marco)]
llama2 + '''\
I am going to call say_scott and say_marco a few times:
say_scott(1)
Scott
''' + gen(max_tokens=20, tools=tools)
Text, not tokens
The standard greedy tokenizations used by most language models introduce a variety of subtle and powerful biases, which that can have all kinds of unintended consequences for your prompts. For example, take the following prompt, given to gpt-2 (standard greedy tokenization):
hf_gen(prompt, max_tokens=10)
from transformers import pipeline
pipe = pipeline("text-generation", model="gpt2")
def hf_gen(prompt, max_tokens=100):
return pipe(prompt, do_sample=False, max_length=max_tokens, return_full_text=False)[0]['generated_text']
prompt = 'http:'
hf_gen(prompt, max_tokens=10)
Notice how the output generated by the LLM does not complete the URL with the obvious next characters (two forward slashes). It instead creates an invalid URL string with a space in the middle. Why? Because the string ://
is its own token, and so once the model sees a colon by itself, it assumes that the next characters cannot be //
; otherwise, the tokenizer would not have used :
, and instead would have used ://
. This is why there are warnings about ending prompts in whitespace, but the problem is way more pervasive than that: any boundary that may span multiple tokens will cause problems, e.g. notice how a partial word causes incorrect completion:
prompt = 'John is a'
hf_gen(prompt, max_tokens=5)
prompt = 'John is a fo'
hf_gen(prompt, max_tokens=5)
While problematic enough for normal prompts, these problems would be a disaster in the kinds of prompts we wrote in this readme, where there is interleaving of prompting and generation happening multiple times (and thus multiple opportunities for problems). This is why {guidance}
implements token healing, a feature that deals with prompt boundaries automatically, allowing users to just think in terms of text rather than tokens. For example:
from guidance import models
gpt = models.Transformers('gpt2')
prompt = 'http:'
gpt + prompt + gen(max_tokens=10)
prompt = 'John is a fo'
gpt + prompt + gen(max_tokens=2)
Fast
Integrated stateful control is faster
We have full control of the decoding loop in our integration with transformers
and llamacpp
, allowing us to add control and additional prompt without any extra cost.
If instead we're calling a server, we pay the extra cost of making additional requests, which might be ok if the server has caching, but quickly becomes impractical if the server does not have fine-grained caching. For example, note again the output from the gsm8k example with calculator above:
Every time we call calculator
, we have to stop generation, append the result to the prompt, and resume generation. To avoid slowing down after the first call, a server would need to keep the KV cache up to '3 for breakfast. So she has calculator(16 - 3)', then roll forward generation from that point on. Even servers that do have caching often don't have a way to guarantee state is preserved at each stop and start, and so user's pay a significant overhead at each interruption. The normal approach of considering everything as a new prompt would cause significant slow downs every time calculator
is called.
Guidance acceleration
In addition to the benefit above, {guidance}
calls are often faster than running equivalent prompts the traditional way, because we can batch any additional text that is added by the user as execution unrolls (rather than generating it). Take the example below, where we generate a json with a GGUF compressed llama2
7B executed using llama.cpp:
@guidance
def character_maker(lm, id, description, valid_weapons):
lm += f"""\
The following is a character profile for an RPG game in JSON format.
```json
{{
"id": "{id}",
"description": "{description}",
"name": "{gen('name', stop='"')}",
"age": {gen('age', regex='[0-9]+', stop=',')},
"armor": "{select(options=['leather', 'chainmail', 'plate'], name='armor')}",
"weapon": "{select(options=valid_weapons, name='weapon')}",
"class": "{gen('class', stop='"')}",
"mantra": "{gen('mantra', stop='"')}",
"strength": {gen('strength', regex='[0-9]+', stop=',')},
"items": ["{gen('item', list_append=True, stop='"')}", "{gen('item', list_append=True, stop='"')}", "{gen('item', list_append=True, stop='"')}"]
}}```"""
return lm
a = time.time()
lm = llama2 + character_maker(1, 'A nimble fighter', ['axe', 'sword', 'bow'])
time.time() - a
Everything that is not green is not actually generated by the model, and is thus batched (much faster). This prompt takes about 1.2 seconds on an A100 GPU. Now, if we let the model generate everything (as in the roughly equivalent prompt below), it takes roughly 2.6
seconds (not only is it slower, we also have less control over generation).
@guidance
def character_maker2(lm, id, description):
lm += f"""\
The following is a character profile for an RPG game in JSON format. It has fields 'id', 'description', 'name', 'age', 'armor', weapon', 'class', 'mantra', 'strength', and 'items (just the names of 3 items)'
please set description to '{description}'
```json""" + gen(stop='```')
return lm
a = time.time()
lm = llama2 + character_maker2(1, 'A nimble fighter')
time.time() - a
Loading models
llama.cpp
Install the python bindings:
CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python
Loading the model:
from guidance import models
lm = models.LlamaCpp(path_to_model, n_gpu_layers=-1)
Transformers
Install transformers:
from guidance import models
lm = models.Transformers(model_name_or_path)
Azure AI
Azure AI is experimenting with a serverside Guidance integration, first available on the Phi-3.5-mini model. To use Guidance with AzureAI, you need to run the pre-release candidate of the guidance
library (v0.2.0rc1).
pip install guidance --pre
For a detailed getting-started guide and more code examples, see the [Phi-3 + Guidance Cookbook](Phi-3 + Guidance cookbook.)
First, deploy a Phi-3.5-mini model using AzureAI Models-as-a-service (https://ai.azure.com/explore/models/Phi-3.5-mini-instruct/version/2/registry/azureml). Then, in your Guidance code, instantiate the AzureGuidance
class:
from guidance.models import AzureGuidance
import os
phi3_url = os.getenv("AZURE_PHI3_URL") # Get the URL and API KEY from your AzureAI deployment dashboard
phi3_api_key = os.getenv("AZURE_PHI3_KEY")
lm = AzureGuidance(f"{phi3_url}/guidance#auth={phi3_api_key}") # note the URL structure using the new /guidance endpoint
Pull the deployment URL and Key from the Azure deployment to instantiate the class. You can now attach any stateless guidance function to the AzureGuidance
lm, and have it execute in a single API call. Stateless guidance functions executing in the cloud benefit from many key guidance features the same way local models do, including token healing, guidance acceleration, and fine-grained model control. Considerable effort and resources went into preparing this experimental pre-release, so please let us know if you encounter any bugs or have helpful feedback!
@guidance(stateless=True) # Note the stateless=True flag in the decorator -- this enables maximal efficiency on the guidance program execution
def character_maker(lm, id, description, valid_weapons):
lm += f"""\
The following is a character profile for an RPG game in JSON format.
```json
{{
"id": "{id}",
"description": "{description}",
"name": "{gen('name', stop='"')}",
"age": {gen('age', regex='[0-9]+', stop=',')},
"armor": "{select(options=['leather', 'chainmail', 'plate'], name='armor')}",
"weapon": "{select(options=valid_weapons, name='weapon')}",
"class": "{gen('class', stop='"')}",
"mantra": "{gen('mantra', stop='"')}",
"strength": {gen('strength', regex='[0-9]+', stop=',')},
"items": ["{gen('item', list_append=True, stop='"')}", "{gen('item', list_append=True, stop='"')}", "{gen('item', list_append=True, stop='"')}"]
}}```"""
return lm
character_lm = lm + character_maker(1, 'A nimble fighter', ['axe', 'sword', 'bow']) # Runs on Azure and streams results back
Vertex AI
Remote endpoints that don't have explicit guidance integration are run "optimistically". This means that all the text that can be forced is given to the model as a prompt (or chat context) and then the model is run in streaming mode without hard constrants (since the remote API doesn't support them). If the model ever violates the contraints then the model stream is stopped and we optionally try it again at that point. This means that all the API-supported control work as expected, and more complex controls/parsing that is not supported by the API work if the model stays consistent with the program.
palm2 = models.VertexAI("text-bison@001")
with instruction():
lm = palm2 + "What is one funny fact about Seattle?"
lm + gen("fact", max_tokens=100)
OpenAI
OpenAI endpoint don't have direct support for guidance grammars, but through optimistic running we can still control them in ways that match the model type:
Legacy completion models:
curie = models.OpenAI("text-curie-001")
curie + "The smallest cats are" + gen(stop=".")
Instruct tuned models:
gpt_instruct = models.OpenAI("gpt-3.5-turbo-instruct")
with instruction():
lm = gpt_instruct + "What are the smallest cats?"
lm += gen(stop=".")
Chat models:
gpt = models.OpenAI("gpt-3.5-turbo")
with system():
lm = gpt + "You are a cat expert."
with user():
lm += "What are the smallest cats?"
with assistant():
lm += gen("answer", stop=".")
Top Related Projects
A guidance language for controlling large language models.
The official Python library for the OpenAI API
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
🦜🔗 Build context-aware reasoning applications
Integrate cutting-edge LLM technology quickly and easily into your apps
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot