KB Curation Pipeline Guide

End-to-end reference for the two-pass extraction pipeline — from document upload to runtime template composition

Core Principle: Metadata, Not Executable Templates

The extraction pipeline produces metadata — structured descriptions of what to collect and what to check. These are not runnable configurations. The Compose API transforms metadata into executable Data Stream and Rule templates at runtime, adapting them to each client's specific environment.

Pipeline Output

Metadata templates (non-executable)

expectedColumns, conditionTemplates, queryTemplate placeholders

Compose Output

Executable configurations (runtime)

Fully resolved queries, concrete rule conditions, client-specific adaptations

7-Step Pipeline Workflow

1. Upload Documents

Upload technology documentation (vendor guides, security hardening docs, connector references) as the raw source material. Documents are ingested and indexed via RAGFlow for semantic retrieval.

Go to Upload Documents

2. Pass 1 — Attribute Refinement

The LLM proposes technology-specific refinements of base UCF evidence requirements, producing TECH_EVIDENCE_REQUIREMENT candidates. These enrich context for Pass 2 by specifying what to look for in the technology docs.

Go to Pass 1 — Attribute Refinement

3. Pass 2 — Control-Driven Extraction

Starting from existing KB controls, the system retrieves relevant document chunks via RAGFlow and extracts DATA_STREAM_TEMPLATE and RULE_PATTERN metadata. This output is non-executable metadata — not runtime templates.

Go to Pass 2 — Control-Driven Extraction

4. Review & Approve Candidates

Subject-matter experts review extracted candidates, verifying accuracy and completeness. Candidates can be approved, rejected, or edited. The Compose Preview lets reviewers see how metadata would become executable templates.

Go to Review & Approve Candidates

5. Create Change Set + Impact Analysis

Approved candidates are grouped into a Change Set. An impact analysis calculates the blast radius — how many templates, compositions, and cases would be affected — before any changes are committed.

Go to Create Change Set + Impact Analysis

6. Publication

Once a Change Set is approved and impact is acceptable, it is published to the knowledge graph. A new graph version is created, making the templates active for all downstream consumers.

Go to Publication

7. Compose (Runtime)

The Compose API transforms metadata templates into executable Data Stream and Rule configurations at runtime, adapting them to the client's specific context (OS family, platform, technology version).

Go to Compose (Runtime)

Recommended Document Types

Upload documents matching these categories for best extraction results. The pipeline auto-classifies documents during ingestion.

Compliance Framework

Regulatory and compliance standards (e.g., SOX, GDPR, NIST). Used to map controls and evidence requirements to technology-specific implementations.

Technology Guide

Vendor documentation, security hardening guides, and best practices for a specific technology. Primary source for Pass 2 extraction.

Connector Documentation

Technical documentation for data connectors (JDBC, ODBC, API, etc.). Describes how to query and collect data from target systems.

Audit Methodology

Audit procedures and testing methodologies. Helps define what evidence to collect and how to validate controls.

Control Library

Catalog of security and compliance controls. Provides the foundation for control-driven extraction in Pass 2.

Other

General-purpose documents that don't fit the above categories. May still contain useful context for extraction.

Frequently Asked Questions

Quick Links

Pipeline Hub

Launch and monitor extraction pipelines

Upload Documents

Add technology documentation to the KB

Review Candidates

Review and approve extracted candidates

Change Sets

Manage governed change sets