Evaluations, provenance, and reproducibility — starting with a small, auditable RAG benchmark.
A compact RAG evaluation for faithfulness + attribution on a fixed manifest of sources/questions.
signature_verified: true when authentic.Run locally → produce report.json/report.html → add one line to scores.csv → open a Pull Request. CI verifies the numbers match your report before merge.
More utilities for data lineage, red-teaming, and reproducible runs. Collaborations welcome.