Collection

Papers

38 papers across 9 categories. Each links to arXiv and includes a local PDF.

Primary Paper

1 paper

Planning Benchmarks

5 papers

Travel Planning

8 papers

Shopping and Search Agents

3 papers

Web Agent Benchmarks

4 papers

Tool Use and APIs

3 papers

Foundation Models

6 papers

Agent Evaluation Frameworks

7 papers

LLM Reasoning and Verification

1 paper