This project uses BAML, our prompt configuration language that extends Jinja to help you get structured outputs from LLMs. BAML can autogenerate a fully typed TypeScript client that can call our prompt templates, handle all deserialization, and stream the results. More on this below!
1: Create a NextJS app with the AI package
npx create-next-app@latest
npm install ai
2. Initialize BAML
We will use the BAML prompt templating configuration to build our RAG prompt.
npm install @boundaryml/baml
npx baml-cli init - this will create a directory to place .baml files in.
Get the VSCode extension for BAML syntax highlighting and playground capabilities.
You will also need to adjust your nextjs.config to support native node modules. See this nextjs.config example.
2. Build a prompt
In BAML you build prompts using a schema-first approach. What's this mean? Well, prompt templates are basically functions that take in input variables and return structured outputs. In BAML you literally define your function signatures and then write the prompt using that signature, which helps reduce prompt engineering.
In this RAG example need a function that takes in a question and a list of documents, and outputs an Answer object, with citations, so this is the function we will build:
function AnswerQuestion(question: string, context: Context) -> Answer
Define your input variables
Lets declare the Context input type in a .baml file in the project, which contains the documents the AI will use to answer the question:
Define the Answer schema
The output schema is an Answer object that contains a list of citatons.
When humans write a paper, they usually cite other papers and then write out their answer using those citations. We will make the LLM do the same:
First give us the citations from the documents that answer the question
Then give us the answer
This will help reduce hallucinations since the LLM will be more likely to use the generated citations in the answer. You can change this order if you want to experiment with different approaches.
Here's the answer schema, which you can add to any .baml file in your project:
Now we have the full function signature types! Let's build the prompt.
Build the LLM function
Here's the full function syntax in BAML -- which will get translated into a python or typescript function.
BAML prompts use the Jinja templating language to help you write structured prompts. You can use Jinja to loop over lists, conditionally render parts of the prompt, and more. We added a couple of helper functions to Jinja, explained below.
There are 2 predefined BAML macros -- ctx.output_format and _.role("user") -- to help us write the output schema instructions into the prompt, and mark a specific part of the prompt as a "user" message.
The {{ ctx.output_format }} line will be replaced with the output structure instructions at runtime, which uses your @description annotations to make things clearer to the LLM.
If you use the BAML Playground in VSCode you can see it rendered in real-time, so you don't have to guess what string will be sent to the LLM client at runtime:
Call your function in NextJS
When you save a .baml file, the BAML VSCode extension generates a typescript client to call the function. Everything runs in your own machine.
Let's build a server action that calls our generated BAML function in a streaming manner. We will use NextJs' createStreamableValue function to stream the results.
Define server action
Consume the stream in a React component
We will add a new page to our NextJS app under app/examples/rag/page.tsx. This page will have a text input for the question, and a button to submit the question to the AI. The answer will be displayed in a textarea, and the citations will be displayed in a list.
This example uses ShadCN UI components, but you can use any UI library you like.
Note that for the purpose of this tutorial we are not addressing how to do document chunking, or doing similarity search. We are just using a hardcoded list of documents:
Render the answer and citations
And you're done!
You should now be able to stream responses. Feel free to change the prompt according to your needs. The VSCode BAML playground makes it very easy to test your prompt on various test cases without having to run the whole program.
With BAML we were able to just call a function without worrying about
Retries, redundancy
Deserialization (BAML fixes common llm json mistakes as well like missing quotes)
Streaming partial json objects
and as we were prompt engineering using our types, we always got to see the full prompt.
The full source code is available on Github, along with some more examples.
You can check out the BAML Documentation for more examples, and join our Discord if you have any questions or need help!