Note: The project requires an NVIDIA GPU with CUDA support. The code is tested on Ubuntu 20.04 with CUDA 12.1 and PyTorch 2.3.1. Windows system is strongly ...
Skill Eval Harness is a Python CLI for testing whether an Agent Skill changes observable output. It reads evals/shared-benchmark.json, emits answer-key-safe task rows, grades files under eval-runs/, ...
一些您可能无法访问的结果已被隐去。
显示无法访问的结果