RAG Node
The RAG Node (Retrieval-Augmented Generation) ingests documents into a Weaviate vector database and retrieves the most semantically relevant chunks to use as context for an AI Data Processing Node.
It connects exclusively to the AI Data Processing Node via a dedicated RAG port (right side of the node).
- Inputs
- Outputs
- Configuration
- Ingestion Lifecycle
- Weaviate Schema
- Node Type
The node receives its input from a Data Source Node (or any upstream node) formatted as markdown with optional file blocks and a query:
Using the attached document, answer:
What are the main topics covered?
### File name "my-document.txt"
<file content here>
- File blocks (optional): One or more
### File name "..."sections — each is chunked, embedded, and stored in the collection. - Query (required for retrieval): The text preceding the first file block. Used to perform a vector similarity search against the stored chunks.
If no files are provided, the node performs a query-only retrieval against the existing collection — useful for interrogating a database populated by other workflows or external tools.
The node outputs the most relevant text chunks as a formatted context string, injected into the AI Data Processing Node's prompt:
[1] (my-document.txt)
...chunk content...
[2] (my-document.txt)
...chunk content...
If no relevant chunks are found, the output is empty (null context) and the AI node proceeds without RAG context.
| Field | Description | Default |
|---|---|---|
| Embedding Model | Ollama model used to generate embeddings. Must be a dedicated embedding model (e.g. nomic-embed-text:latest). Locked to the collection's model when an existing collection is selected. | — |
| Collection Name | Namespace for the knowledge base in Weaviate (e.g. rag_collection). Use the browse button (🔍) to pick from existing collections. | rag_collection |
| Chunk Size (words) | Number of words per chunk when splitting documents. | 200 |
| Chunk Overlap (words) | Number of words shared between consecutive chunks (improves context continuity). | 20 |
| Top-K Results | Number of most relevant chunks to inject as context. | 5 |
| Weaviate URL | URL of the Weaviate instance (e.g. http://localhost:8080). | http://localhost:8080 |
| API Key (optional) | Weaviate API key. Leave blank for unauthenticated local instances. | — |
Embedding model tip: Use a dedicated embedding model like
nomic-embed-text:latestrather than a general LLM. General LLMs produce lower-quality embeddings and may have dimension mismatches.
The node uses per-file content hashing to avoid redundant re-embedding:
| Scenario | Behavior |
|---|---|
| File is new to the collection | Embedded and inserted |
| File content unchanged | Skipped — existing vectors reused |
| File content changed | Old chunks for that file deleted, file re-embedded and re-inserted |
| File removed from workflow input | Chunks remain in DB — preserved for other workflows |
| Embedding model changed | Entire chunks class dropped and all current files re-ingested |
| Query only (no files in input) | Queries whatever is already in the collection |
This makes the node safe to use in a shared knowledge base scenario: multiple workflows can contribute documents to the same collection without overwriting each other's data.
Each collection is stored as two Weaviate classes. The class name is derived from the collection name by stripping non-alphanumeric characters and uppercasing the first letter (e.g. rag_collection → Ragcollection).
Chunks class ({ClassName}, e.g. Ragcollection):
{
"class": "Ragcollection",
"vectorizer": "none",
"properties": [
{ "name": "content", "dataType": ["text"] },
{ "name": "sourceFile", "dataType": ["text"] },
{ "name": "chunkIndex", "dataType": ["int"] }
]
}
Each object also carries a vector (float array). Dimension depends on the embedding model — nomic-embed-text:latest produces 768-dimensional vectors.
Meta class ({ClassName}Meta, e.g. RagcollectionMeta):
{
"class": "RagcollectionMeta",
"vectorizer": "none",
"properties": [
{ "name": "documentHash", "dataType": ["text"] },
{ "name": "embeddingModel", "dataType": ["text"] },
{ "name": "originalName", "dataType": ["text"] }
]
}
Exactly one object exists in the meta class per collection. documentHash is a JSON map of { "filename": "sha256hex", ... } for per-file change detection. originalName stores the human-readable collection name (e.g. rag_collection) for display in the UI.
Minimum viable external collection (to be queryable from Agentic Signal without triggering re-ingestion):
- Create
Ragcollectionwithcontent,sourceFile,chunkIndexproperties and correctly-dimensioned vectors. - Create
RagcollectionMetawith one object:{ documentHash: "{}", embeddingModel: "nomic-embed-text:latest", originalName: "rag_collection" }.
- Node type:
rag - Port: RAG port (right side) — connects only to an AI Data Processing Node