Changelog¶
All notable changes to Gweta will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
0.1.0 - 2025-02-06¶
Added¶
Core¶
Chunkdataclass for universal chunk representationQualityReportandQualityIssuetypesGwetaSettingsconfiguration management- Structured logging with
get_logger()
Validation (4 Layers)¶
- Layer 1: Extraction Quality
ExtractionValidatorfor raw text validation- Gibberish detection
- Encoding validation
-
Language detection
-
Layer 2: Chunk Quality
ChunkValidatorfor chunk validation- Information density scoring
- Coherence checking
-
DuplicateDetectorusing MinHash LSH -
Layer 3: Domain Rules
DomainRuleEnginewith YAML configuration- Regex pattern matching
- Numerical range validation
- Required field validation
-
Known fact cross-referencing
-
Layer 4: KB Health
HealthCheckerfor ongoing monitoring- Staleness detection
- Duplicate identification
GoldenDatasetRunnerfor retrieval testing- JUnit XML and JSON export
Acquisition¶
GwetaCrawlerwrapping Crawl4AIPDFExtractorwith table extractionDatabaseSourcewith SQL safetyAPIClientfor REST endpoints- Sitemap and RSS fetchers
Ingestion¶
RecursiveChunkerfor text splittingChromaStoreadapterQdrantStoreadapterPineconeStoreadapterWeaviateStoreadapterBaseStoreabstract interface
MCP Server¶
- FastMCP-based server
- 10 MCP tools for AI agent integration
- 4 MCP resources
- 3 MCP prompts
- stdio and HTTP transports
CLI¶
gweta validatecommandgweta crawlcommandgweta ingestcommandgweta healthcommandgweta servecommand
Documentation¶
- Getting started guide
- Architecture overview
- API reference
- Example scripts
Version History¶
| Version | Date | Description |
|---|---|---|
| 0.1.0 | 2025-02-06 | Initial release |