Using RAG as a NoSQL Database for More Accurate Answers

Rethinking the way you use RAG

The origin of this article

These days, I’m experimenting with fine-tuning a tiny LLM to try to teach it a scripting language. That scripting language is GoloScript. I created the GoloScript interpreter to bring back to life the research project behind the Golo language (created by “The Doc”, Julien Ponge).

The advantage of using the Golo language is that it has remained fairly obscure, so theoretically, the tiny language models I use for fine-tuning shouldn’t know this language. This lets me verify whether the fine-tuning actually worked, and whether the model truly learned to understand and generate GoloScript.

To be honest, the fine-tuning topic is particularly interesting and I’m learning a lot (after lengthy discussions with Claude, I managed to put together a “user-friendly” fine-tuning script that runs on macOS), but it’s extremely time-consuming, and creating a “perfect” dataset for fine-tuning is a real challenge. (In short, I’m far from done.)

What if we could do it differently? … With RAG, for instance?

So I dug into the question. But first, let’s talk about a weakness of RAG.

RAG’s imprecision: information loss

Most of the time when doing RAG, the pipeline looks like this:

Split documents into chunks using various “more or less effective” strategies.
Generate embeddings for each chunk.
Store the embeddings in a vector database.
When a question is asked, generate an embedding for the question.
Retrieve the most relevant chunks based on similarity between the question embedding and the chunk embeddings.
Use these chunks as context to generate an answer to the question.

flowchart TD A[📄 Documents] --> B[Split documents into chunks] B --> C[Generate embeddings for each chunk] C --> D[(Vector Database)] E[❓ User Question] --> F[Generate embedding for the question] F --> G[Retrieve most relevant chunks\nby similarity search] D --> G G --> H[Use chunks as context\nfor the LLM] H --> I[✅ Generated Answer] style A fill:#4a90d9,color:#fff style E fill:#e67e22,color:#fff style D fill:#27ae60,color:#fff style I fill:#8e44ad,color:#fff

When splitting, the risk is losing the semantic meaning of the document fragment. And when retrieving, the risk is not getting the right document fragments, or not getting enough of them for the model to generate an accurate answer.

Let’s think differently

For my use case — “feeding a TLM enough material so it can learn to understand and generate GoloScript” — I figured I could do things differently. I need to think of each chunk as an “entry” in a NoSQL database, not as a document fragment to retrieve.

Each chunk is a “self-contained” document that holds the information in its entirety, with metadata to facilitate search and also to help the embeddings model understand the chunk’s context, as well as its relationships with other chunks. So, rather than splitting an existing documentation into pieces, I directly create my documentation as a structured collection of individual documents.

chat-rag-cli

For my experiments (and my RAG needs), I created a small CLI tool, chat-rag-cli, which lets me easily index (create embeddings in a JSON store) markdown, xml, and yaml documents, and then perform similarity searches and ask an LLM to generate an answer from the most relevant retrieved documents.

For example:

Index the documents:

chat-rag-cli --config ./config.yml \
--documents-path ./documents \
--store-path-file ./store/snippets.json index

Search and generate an answer from the retrieved documents:

chat-rag-cli prompt "create a hello world program in GoloScript." \
--config ./config.yml \
--store-path-file ./store/snippets.json

This is the tool I used to verify my hypotheses. But the concept described in this article can be reused with any RAG system, and any vector database.

👋 I use chat-rag-cli with 🐳 Docker Model Runner, but it can work with any model engine or platform that exposes an API compatible with the OpenAI API.

Structuring my documentation / database

I directly structured my documentation in a yaml file, as if it were a NoSQL database. Each document is a “record” with fields for metadata (id, name, description, language, keywords, topic, related_topics) and a field for the code itself. The code field contains GoloScript code examples but also explanations:

erDiagram SNIPPETS ||--o{ KEYWORD : has SNIPPETS ||--o{ RELATED_TOPIC : references SNIPPETS { int id string name string description string language string topic string code } KEYWORD { string value } RELATED_TOPIC { string value }

Which gives us in yaml:

snippets:
  - id: 1
    name: hello_world
    description: "Minimal GoloScript program with module declaration and main function"
    language: goloscript
    keywords:
      - hello world
      - main function
      - module
      - println
      - entry point
    topic: basic_syntax
    related_topics:
      - modules
      - program_structure
    code: |
      module hello.World

      function main = |args| {
        println("Hello, GoloScript!")
      }

  - id: 2
    name: variables_and_constants
    description: "Declare mutable variables with var and immutable constants with let"
    language: goloscript
    keywords:
      - var
      - let
      - variable
      - constant
      - mutable
      - immutable
      - assignment
    topic: basic_syntax
    related_topics:
      - data_types
    code: |
      module examples.Variables

      function main = |args| {
        # Mutable variables
        var x = 10
        x = 20  # OK

        # Immutable constants
        let y = 30
        # y = 40  # ERROR - immutable

        println("x =", x)
        println("y =", y)
      }
# etc...

📝 The full file is available here: https://codeberg.org/ai-apocalypse-survival-kit/learning-goloscript-with-rag/src/branch/main/documents/snippets.goloscript.yaml

I then used my CLI to index this yaml file into a JSON embeddings store, and I was able to perform similarity searches from natural language questions, and ask an LLM to generate GoloScript code from the most relevant retrieved documents.

Embeddings generation

For embeddings generation, I used the Qwen3-Embedding-0.6B-GGUF model. It has the advantage of having an Embedding Dimension of 1024, which allows using larger document chunks without truncating them, thus reducing the risk of information loss. And if you need more, it has bigger siblings with larger embedding dimensions:

Qwen3-Embedding-4B-GGUF (2560)
Qwen3-Embedding-8B-GGUF (4096)

Answer generation from retrieved documents (similarities)

For answer generation from retrieved documents, I ran tests with different generation models:

So, does it actually work?

I indexed my GoloScript snippets file into a JSON embeddings store:

chat-rag-cli --config ./config.yml --documents-path ./documents --store-path-file ./store/snippets.json index

And then I ran similarity searches from natural language questions, and asked an LLM to generate GoloScript code from the most relevant retrieved documents.

Search and generate a response from the retrieved documents

chat-rag-cli prompt "create a hello world program in GoloScript." \
--config ./config.yml \
--store-path-file ./store/snippets.json \
--output ./response.md

chat-rag-cli prompt "I need a Human DynamicObject with these fields: name and age." \
--config ./config.yml \
--store-path-file ./store/snippets.json \
--output ./response.md

chat-rag-cli prompt "I need a Human structure with these fields: firstName and lastName." \
--config ./config.yml \
--store-path-file ./store/snippets.json \
--output ./response.md

Both models did really well — the proposed code examples perfectly followed GoloScript syntax. In GoloScript, the concepts of DynamicObject and struct are fairly similar, but the models correctly distinguished between them (so far I’ve been less lucky with fine-tuning).

Example of generated code for the DynamicObject question:

# Create a Human DynamicObject
let human = DynamicObject()

# Set properties (use property name as method with argument)
human: name("Alice")
human: age(30)

# Get properties (call with no argument)
println("Name:", human: name())
println("Age:", human: age())

Example of generated code for the struct question:

# Define a structure
struct Human = { firstName, lastName }

# Create an instance
let h1 = Human("John", "Doe")

# Access fields (getter call with no args)
let firstName = h1: firstName()
let lastName = h1: lastName()
println("Human:", firstName, lastName)

Conclusion

By structuring my documentation somewhat like a NoSQL database, I was able to do retrieval-augmented generation in a more efficient and accurate way. Since the data returned during similarity searches is complete, it allows a small generation model to produce more accurate and more relevant answers.

I plan to use chat-rag-cli with code agents and TLMs (Tiny Language Models) to have development assistance at hand. Without going full vibe coding, I’d have helpers for coding. Stay tuned…

You can find the code for this project on this repository: https://codeberg.org/ai-apocalypse-survival-kit/learning-goloscript-with-rag

Subscribe: 📡 RSS | ⚛️ Atom