NOVA RAG Agent & Docker Model Runner

Yesterday I was telling you about the “NOVA Chat Agent” (Ref: N.O.V.A. Chat Agent & Docker Model Runner).

When working with local SLMs, or even TLMs (Tiny Language Models - I’m claiming authorship of this term), you can find yourself limited by the model’s knowledge (small model, small knowledge). That’s where RAG Agents (Retrieval-Augmented Generation) come into play.

In the NOVA library, I’ve implemented a RAG Agent that can be used to perform similarity searches in a document database, and then you can use those similarities by passing them to the “Chat Agent” to generate a response “augmented” by the retrieved documents.

Currently NOVA offers a RAG Agent with either the ability to work in memory and persist (and therefore use) a vector store in a JSON file, or to use an “external” vector store using a Redis database and its vector capabilities. The first mode (JSON) is handy for testing, or if you have few documents, while the second mode (REDIS) is better suited for use with a large number of documents and the ability to easily add documents.

Let’s see how to use the RAG Agent with the “in memory” (JSON) mode with a code example.

RAG Agent and JSON store

Here’s a code example of using the RAG Agent:

You’ll need the ai/mxbai-embed-large:latest model for this code, which you can run locally with Docker Model Runner:

docker model pull ai/mxbai-embed-large:latest

✋ You can of course use another embeddings model, but you’ll need to test it to verify if it fits your use case, your way of “chunking” your documents, and your similarity threshold. The mxbai-embed-large model is a good starting point for most use cases.

Initialize your Go project and install the library:

go get github.com/snipwise/nova@latest
go mod init rag-demo
touch main.go

Create a main.go file and copy the following code into it: main.go:

package main

import (
	"context"
	"fmt"
	"strings"

	"github.com/snipwise/nova/nova-sdk/agents"
	"github.com/snipwise/nova/nova-sdk/agents/rag"
	"github.com/snipwise/nova/nova-sdk/models"
)

func main() {
	ctx := context.Background()

	storePathFile := "./store/animals.json"

	// Initial documents to load
	txtChunks := []string{
		"Squirrels run in the forest",
		"Birds fly in the sky",
		"Frogs swim in the pond",
		"Fishes swim in the sea",
		"Lions roar in the savannah",
		"Eagles soar above the mountains",
		"Dolphins leap out of the ocean",
		"Bears fish in the river",
		"Tigers prowl in the jungle",
		"Whales sing in the ocean",
		"Owls hoot at night",
		"Monkeys swing in the trees",
		"Butterflies flutter in the garden",
		"Bees buzz around flowers",
	}

	agent, err := rag.NewAgent(
		ctx,
		agents.Config{
			EngineURL: "http://localhost:12434/engines/llama.cpp/v1",
		},
		models.Config{
			Name: "ai/mxbai-embed-large:latest",
		},
		rag.WithJsonStore(storePathFile),
		// DocumentLoadModeSkipDuplicates:
        // will skip loading documents that are already in the store (based on content hash)
		rag.WithDocuments(txtChunks, rag.DocumentLoadModeSkipDuplicates),
	)
	if err != nil {
		panic(err)
	}

	fmt.Println("✅ RAG Agent created with JSON store and initial documents")

	// Note: The store is automatically persisted when using WithJsonStore + WithDocuments
	// and new documents are added. With DocumentLoadModeSkip, documents are only added
	fmt.Printf("📁 Store file location: %s\n", storePathFile)

	fmt.Println(strings.Repeat("=", 60))

	// Test similarity search
	queries := []string{
		"What animals live in water?",
		"Which creatures can fly?",
		"Animals in the forest",
		"What animals are in the jungle?",
		"Who sings in the ocean?",
		"Which animals are active at night?",
		"Which animals are found in the trees?",
		"Which animals buzz around flowers?",
		"Which animals are big cats?",
	}

	for _, query := range queries {
		fmt.Printf("\n🔍 Query: %s\n", query)
		fmt.Println(strings.Repeat("-", 60))

		similarities, err := agent.SearchSimilar(query, 0.6)
		if err != nil {
			fmt.Printf("❌ Error searching: %v\n", err)
			continue
		}

		if len(similarities) == 0 {
			fmt.Println("No similar documents found (threshold: 0.6)")
		} else {
			for i, sim := range similarities {
				fmt.Printf("%d. [%.3f] %s\n", i+1, sim.Similarity, sim.Prompt)
			}
		}
	}
}

Now, you can run your program:

go mod tidy
go run main.go

The JSON store will be automatically created in the ./store/animals.json folder and the initial documents will be loaded. Then, you’ll see the similarity search results for each query. You should get output like this:

✅ RAG Agent created with JSON store and initial documents
📁 Store file location: ./store/animals.json
============================================================

🔍 Query: What animals live in water?
------------------------------------------------------------
1. [0.643] Fishes swim in the sea

🔍 Query: Which creatures can fly?
------------------------------------------------------------
1. [0.708] Birds fly in the sky

🔍 Query: Animals in the forest
------------------------------------------------------------
1. [0.754] Squirrels run in the forest
2. [0.689] Tigers prowl in the jungle
3. [0.653] Monkeys swing in the trees

🔍 Query: What animals are in the jungle?
------------------------------------------------------------
1. [0.737] Tigers prowl in the jungle

🔍 Query: Who sings in the ocean?
------------------------------------------------------------
1. [0.639] Whales sing in the ocean

🔍 Query: Which animals are active at night?
------------------------------------------------------------
1. [0.616] Owls hoot at night

🔍 Query: Which animals are found in the trees?
------------------------------------------------------------
1. [0.666] Squirrels run in the forest
2. [0.648] Monkeys swing in the trees
3. [0.619] Tigers prowl in the jungle

🔍 Query: Which animals buzz around flowers?
------------------------------------------------------------
1. [0.809] Bees buzz around flowers
2. [0.649] Butterflies flutter in the garden

🔍 Query: Which animals are big cats?
------------------------------------------------------------
No similar documents found (threshold: 0.6)
============================================================

The program displays similar documents for each query, along with their similarity score. You can adjust the similarity threshold (0.6 in this example) to get more or fewer results depending on your needs.

You can see that the RAG Agent is fairly straightforward to use. In the next section, I’ll show you a version with the “Redis Store”.

You can find the complete code for this RAG Agent in the samples folder of the NOVA library, in the file main.go. As well as some documentation: rag-agent-guide-en.md - work in progress

RAG Agent and Redis store

Now let’s look at an example with the Redis store. First, you’ll need to have a Redis instance running locally or remotely. You can use Docker Compose for that:

compose.yml:

services:
  redis-server:
    image: redis:8.2.3-alpine3.22
    container_name: nova-redis-vector-store
    ports:
      - "6379:6379"
    volumes:
      - ./data:/data
    environment:
      # Redis persistence: save every 30s if at least 1 key changed
      - REDIS_ARGS=--save 30 1
    restart: unless-stopped

✋ Note: Redis 8.2.3 includes RediSearch natively for vector similarity search, then no need for redis-stack - standard Redis 8.x has built-in vector support!

Start your Redis instance:

docker compose up -d

Now let’s get to the Go code for using the RAG Agent with Redis as the vector store. Here’s a code example:

main.go:

package main

import (
	"context"
	"fmt"

	"github.com/joho/godotenv"
	"github.com/snipwise/nova/nova-sdk/agents"
	"github.com/snipwise/nova/nova-sdk/agents/rag"
	"github.com/snipwise/nova/nova-sdk/agents/rag/stores"
	"github.com/snipwise/nova/nova-sdk/models"
)

func main() {
	// This example demonstrates DocumentLoadModeSkipDuplicates
	// Run this program multiple times - it will NOT create duplicates!
	ctx := context.Background()

	// Configuration
	engineURL := "http://localhost:12434/engines/llama.cpp/v1"
	embeddingModel := "ai/mxbai-embed-large:latest"

	// documents
	documents := []string{
		"Squirrels run in the forest and collect acorns for winter",
		"Birds fly in the sky and migrate south during winter",
		"Frogs swim in the pond and catch insects with their tongues",
		"Bears hibernate in caves during the cold winter months",
		"Rabbits hop through meadows and live in underground burrows",
	}

	// Create RAG agent with Redis and DocumentLoadModeSkipDuplicates
	ragAgent, err := rag.NewAgent(
		ctx,
		agents.Config{
			Name:      "SkipDuplicatesDemo",
			EngineURL: engineURL,
		},
		models.Config{
			Name: embeddingModel,
		},
		rag.WithRedisStore(stores.RedisConfig{
			Address:   "localhost:6379",
			Password:  "",
			DB:        0,
			IndexName: "skip_duplicates_demo",
		}, 1024),
		rag.WithDocuments(documents, rag.DocumentLoadModeSkipDuplicates),
	)
	if err != nil {
		fmt.Printf("❌ Failed to create RAG agent: %v\n", err)
		return
	}

	fmt.Println("✅ RAG Agent created with Redis store and initial documents")
	fmt.Println()

	fmt.Println("Testing Similarity Search...")

	// Test similarity search
	query := "What do animals do in winter?"
	fmt.Printf("🔍 Query: %s\n", query)
	fmt.Println()

	results, err := ragAgent.SearchTopN(query, 0.3, 3)
	if err != nil {
		fmt.Printf("❌ Failed to search: %v\n", err)
		return
	}

	fmt.Printf("📊 Top %d results (similarity > 0.3):\n", len(results))
	for i, result := range results {
		fmt.Printf("%d. [%.3f] %s\n", i+1, result.Similarity, result.Prompt)
	}

	fmt.Println()
}

Run the program:

go run main.go

You should get output like this:

✅ RAG Agent created with Redis store and initial documents

Testing Similarity Search...
🔍 Query: What do animals do in winter?

📊 Top 3 results (similarity > 0.3):
1. [0.663] Birds fly in the sky and migrate south during winter
2. [0.646] Squirrels run in the forest and collect acorns for winter
3. [0.619] Bears hibernate in caves during the cold winter months

You can find the complete code for this RAG Agent in the samples folder of the NOVA library, in the file main.go.

A sample project using the Chat Agent and the RAG Agent

I’ve created a small CLI that uses the Chat Agent “augmented” with the RAG Agent to generate content from the terminal based on an XML knowledge base, which you can find on Codeberg: chat-rag-cli. And which I use, for example, like this:

./chat-rag-cli \
    prompt "How to define a struct in Golang? Please use your knowledge to answer." \
    --instructions context.md \
    --output result.md

To summarize the code for using both agents together, here are the main steps:

Search for similarities with the RAG Agent using the user’s question as the search query
Build a message list for the Chat Agent, including the similar documents in the instruction prompt
Call the Chat Agent with the constructed message list

Here’s a code snippet that illustrates the similarity search with the RAG Agent and the prompt construction for the Chat Agent:

userQuestion := "How to define a struct in Golang? Please use your knowledge to answer."

var knowledgeBase string

// Search for similar documents in the RAG Agent
// We use a similarity threshold of 0.5 and want to retrieve the top 3 most similar documents
similarities, err := ragAgent.SearchTopN(userQuestion, 0.5, 3)
if err != nil {
    fmt.Printf("❌ Error searching: %v\n", err)
}

if len(similarities) == 0 {
    fmt.Println("No similar documents found (threshold: 0.6)")
} else {
    fmt.Printf("📚 Found %d similar documents:\n", len(similarities))
    for i, sim := range similarities {
        fmt.Printf("📗 %d. [%.3f] %s\n", i+1, sim.Similarity, sim.Prompt)
        knowledgeBase += sim.Prompt + "\n"
    }
    knowledgeBase = fmt.Sprintf(
        "Here is some knowledge that might be useful for answering the user's question:\n%s",
        knowledgeBase,
    )
}

messagesList := []messages.Message{}
if knowledgeBase != "" {
    messagesList = append(messagesList, messages.Message{
        Role:    roles.System,
        Content: knowledgeBase,
    })
}
messagesList = append(messagesList, messages.Message{
    Role:    roles.User,
    Content: userQuestion,
})

That’s all for today. Next time, we’ll see how to use the Crew Agent to orchestrate multiple Chat Agents, coupled with a RAG Agent, while being mindful of the conversational memory context size. Stay tuned!

✋ Note: NOVA provides a few helpers for “chunking” text, markdown and XML documents: nova-sdk/agents/rag/chunks. I plan to add more in the future.

Subscribe: 📡 RSS | ⚛️ Atom