Evaluations, provenance, and reproducibility — starting with a small, auditable RAG benchmark.
A compact RAG evaluation for faithfulness + attribution on a fixed manifest of sources/questions.
signature_verified: true
when authentic.Run locally → produce report.json
/report.html
→ add one line to scores.csv
→ open a Pull Request. CI verifies the numbers match your report before merge.
More utilities for data lineage, red-teaming, and reproducible runs. Collaborations welcome.