Skip to content

Changelog

All notable changes to Gweta will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

0.1.0 - 2025-02-06

Added

Core

  • Chunk dataclass for universal chunk representation
  • QualityReport and QualityIssue types
  • GwetaSettings configuration management
  • Structured logging with get_logger()

Validation (4 Layers)

  • Layer 1: Extraction Quality
  • ExtractionValidator for raw text validation
  • Gibberish detection
  • Encoding validation
  • Language detection

  • Layer 2: Chunk Quality

  • ChunkValidator for chunk validation
  • Information density scoring
  • Coherence checking
  • DuplicateDetector using MinHash LSH

  • Layer 3: Domain Rules

  • DomainRuleEngine with YAML configuration
  • Regex pattern matching
  • Numerical range validation
  • Required field validation
  • Known fact cross-referencing

  • Layer 4: KB Health

  • HealthChecker for ongoing monitoring
  • Staleness detection
  • Duplicate identification
  • GoldenDatasetRunner for retrieval testing
  • JUnit XML and JSON export

Acquisition

  • GwetaCrawler wrapping Crawl4AI
  • PDFExtractor with table extraction
  • DatabaseSource with SQL safety
  • APIClient for REST endpoints
  • Sitemap and RSS fetchers

Ingestion

  • RecursiveChunker for text splitting
  • ChromaStore adapter
  • QdrantStore adapter
  • PineconeStore adapter
  • WeaviateStore adapter
  • BaseStore abstract interface

MCP Server

  • FastMCP-based server
  • 10 MCP tools for AI agent integration
  • 4 MCP resources
  • 3 MCP prompts
  • stdio and HTTP transports

CLI

  • gweta validate command
  • gweta crawl command
  • gweta ingest command
  • gweta health command
  • gweta serve command

Documentation

  • Getting started guide
  • Architecture overview
  • API reference
  • Example scripts

Version History

Version Date Description
0.1.0 2025-02-06 Initial release