Using RAG as a NoSQL Database for More Accurate Answers
Rethinking the way you use RAG
The origin of this article
These days, I’m experimenting with fine-tuning a tiny LLM to try to teach it a scripting language. That scripting language is GoloScript. I created the GoloScript interpreter to bring back to life the research project behind the Golo language (created by “The Doc”, Julien Ponge).
The advantage of using the Golo language is that it has remained fairly obscure, so theoretically, the tiny language models I use for fine-tuning shouldn’t know this language. This lets me verify whether the fine-tuning actually worked, and whether the model truly learned to understand and generate GoloScript.
To be honest, the fine-tuning topic is particularly interesting and I’m learning a lot (after lengthy discussions with Claude, I managed to put together a “user-friendly” fine-tuning script that runs on macOS), but it’s extremely time-consuming, and creating a “perfect” dataset for fine-tuning is a real challenge. (In short, I’m far from done.)
What if we could do it differently? … With RAG, for instance?
So I dug into the question. But first, let’s talk about a weakness of RAG.
RAG’s imprecision: information loss
Most of the time when doing RAG, the pipeline looks like this:
- Split documents into chunks using various “more or less effective” strategies.
- Generate embeddings for each chunk.
- Store the embeddings in a vector database.
- When a question is asked, generate an embedding for the question.
- Retrieve the most relevant chunks based on similarity between the question embedding and the chunk embeddings.
- Use these chunks as context to generate an answer to the question.
When splitting, the risk is losing the semantic meaning of the document fragment. And when retrieving, the risk is not getting the right document fragments, or not getting enough of them for the model to generate an accurate answer.
Let’s think differently
For my use case — “feeding a TLM enough material so it can learn to understand and generate GoloScript” — I figured I could do things differently. I need to think of each chunk as an “entry” in a NoSQL database, not as a document fragment to retrieve.
Each chunk is a “self-contained” document that holds the information in its entirety, with metadata to facilitate search and also to help the embeddings model understand the chunk’s context, as well as its relationships with other chunks. So, rather than splitting an existing documentation into pieces, I directly create my documentation as a structured collection of individual documents.
chat-rag-cli
For my experiments (and my RAG needs), I created a small CLI tool, chat-rag-cli, which lets me easily index (create embeddings in a JSON store) markdown, xml, and yaml documents, and then perform similarity searches and ask an LLM to generate an answer from the most relevant retrieved documents.
For example:
Index the documents:
chat-rag-cli --config ./config.yml \
--documents-path ./documents \
--store-path-file ./store/snippets.json index
Search and generate an answer from the retrieved documents:
chat-rag-cli prompt "create a hello world program in GoloScript." \
--config ./config.yml \
--store-path-file ./store/snippets.json
This is the tool I used to verify my hypotheses. But the concept described in this article can be reused with any RAG system, and any vector database.
👋 I use
chat-rag-cliwith 🐳 Docker Model Runner, but it can work with any model engine or platform that exposes an API compatible with the OpenAI API.
Structuring my documentation / database
I directly structured my documentation in a yaml file, as if it were a NoSQL database. Each document is a “record” with fields for metadata (id, name, description, language, keywords, topic, related_topics) and a field for the code itself. The code field contains GoloScript code examples but also explanations:
Which gives us in yaml:
snippets:
- id: 1
name: hello_world
description: "Minimal GoloScript program with module declaration and main function"
language: goloscript
keywords:
- hello world
- main function
- module
- println
- entry point
topic: basic_syntax
related_topics:
- modules
- program_structure
code: |
module hello.World
function main = |args| {
println("Hello, GoloScript!")
}
- id: 2
name: variables_and_constants
description: "Declare mutable variables with var and immutable constants with let"
language: goloscript
keywords:
- var
- let
- variable
- constant
- mutable
- immutable
- assignment
topic: basic_syntax
related_topics:
- data_types
code: |
module examples.Variables
function main = |args| {
# Mutable variables
var x = 10
x = 20 # OK
# Immutable constants
let y = 30
# y = 40 # ERROR - immutable
println("x =", x)
println("y =", y)
}
# etc...
📝 The full file is available here: https://codeberg.org/ai-apocalypse-survival-kit/learning-goloscript-with-rag/src/branch/main/documents/snippets.goloscript.yaml
I then used my CLI to index this yaml file into a JSON embeddings store, and I was able to perform similarity searches from natural language questions, and ask an LLM to generate GoloScript code from the most relevant retrieved documents.
Embeddings generation
For embeddings generation, I used the Qwen3-Embedding-0.6B-GGUF model. It has the advantage of having an Embedding Dimension of 1024, which allows using larger document chunks without truncating them, thus reducing the risk of information loss. And if you need more, it has bigger siblings with larger embedding dimensions:
- Qwen3-Embedding-4B-GGUF (
2560) - Qwen3-Embedding-8B-GGUF (
4096)
Answer generation from retrieved documents (similarities)
For answer generation from retrieved documents, I ran tests with different generation models:
So, does it actually work?
I indexed my GoloScript snippets file into a JSON embeddings store:
chat-rag-cli --config ./config.yml --documents-path ./documents --store-path-file ./store/snippets.json index
And then I ran similarity searches from natural language questions, and asked an LLM to generate GoloScript code from the most relevant retrieved documents.
Search and generate a response from the retrieved documents
chat-rag-cli prompt "create a hello world program in GoloScript." \
--config ./config.yml \
--store-path-file ./store/snippets.json \
--output ./response.md
chat-rag-cli prompt "I need a Human DynamicObject with these fields: name and age." \
--config ./config.yml \
--store-path-file ./store/snippets.json \
--output ./response.md
chat-rag-cli prompt "I need a Human structure with these fields: firstName and lastName." \
--config ./config.yml \
--store-path-file ./store/snippets.json \
--output ./response.md
Both models did really well — the proposed code examples perfectly followed GoloScript syntax. In GoloScript, the concepts of DynamicObject and struct are fairly similar, but the models correctly distinguished between them (so far I’ve been less lucky with fine-tuning).
Example of generated code for the DynamicObject question:
# Create a Human DynamicObject
let human = DynamicObject()
# Set properties (use property name as method with argument)
human: name("Alice")
human: age(30)
# Get properties (call with no argument)
println("Name:", human: name())
println("Age:", human: age())
Example of generated code for the struct question:
# Define a structure
struct Human = { firstName, lastName }
# Create an instance
let h1 = Human("John", "Doe")
# Access fields (getter call with no args)
let firstName = h1: firstName()
let lastName = h1: lastName()
println("Human:", firstName, lastName)
Conclusion
By structuring my documentation somewhat like a NoSQL database, I was able to do retrieval-augmented generation in a more efficient and accurate way. Since the data returned during similarity searches is complete, it allows a small generation model to produce more accurate and more relevant answers.
I plan to use
chat-rag-cliwith code agents and TLMs (Tiny Language Models) to have development assistance at hand. Without going full vibe coding, I’d have helpers for coding. Stay tuned…
You can find the code for this project on this repository: https://codeberg.org/ai-apocalypse-survival-kit/learning-goloscript-with-rag