The first thing to try after setup is the demo. The first thing to use for a repeatable benchmark run is scripts/run_task_by_id.sh. iOSWorld builds and installs real iOS apps, so Xcode must be ready ...
"""Load per-server disabled tool sets from the database.""" - Only use tools when needed. Don't search for things you already know. - For web lookup/search/latest ...