Agent evaluations for Nuxt coding tasks, powered by @vercel/agent-eval.
pnpm install
cp .env.example .env # requires VERCEL_OIDC_TOKEN and AI_GATEWAY_API_KEYRuns agent evaluations with memoization. Only runs (model, eval) pairs that haven't been completed yet.
pnpm run eval # Run only missing pairs
pnpm run eval -- claude-opus-4.6 # Run a specific experiment
pnpm run eval -- --force # Re-run everything
pnpm run eval:smoke # Run smoke test (1 eval per experiment)
pnpm run eval:dry # Preview what would runExports clean results to agent-results.json.
pnpm run export-results # Export from all experiments
pnpm run export-results -- claude-opus-4.6 # Export specific experiment| Experiment | Agent | Model |
|---|---|---|
claude-opus-4.6 |
claude-code |
claude-opus-4-6 |
claude-sonnet-4.5 |
claude-code |
claude-sonnet-4-5 |
claude-sonnet-4.6 |
claude-code |
claude-sonnet-4-6 |
cursor-composer-1.5 |
cursor |
composer-1.5 |
cursor-composer-2.0 |
cursor |
composer-2 |
gemini-3-pro-preview |
gemini |
gemini-3-pro-preview |
gemini-3.1-pro-preview |
gemini |
gemini-3.1-pro-preview |
devstral-2 |
opencode |
vercel/mistral/devstral-2 |
gpt-5.3-codex-xhigh |
codex |
gpt-5.3-codex-api-preview?reasoningEffort=xhigh |
gpt-5.4-xhigh |
vercel-ai-gateway/codex |
openai/gpt-5.4?reasoningEffort=xhigh |
Each eval is a self-contained Nuxt project in evals/. Most evals provide broken or suboptimal starter code that the agent must fix — the prompt describes a symptom without revealing the solution.
evals/nuxt-000-fix-data-fetching/
├── PROMPT.md # task given to the agent
├── EVAL.ts # vitest assertions (withheld from the agent)
├── package.json # Nuxt project manifest
├── nuxt.config.ts
├── tsconfig.json
├── eslint.config.mjs
├── server/
│ └── api/
│ └── greeting.ts
└── app/
├── app.vue
└── pages/
└── index.vue # broken starter code the agent must fix
| File | Purpose |
|---|---|
PROMPT.md |
The task prompt sent to the agent |
EVAL.ts |
Test file run after the agent finishes (withheld from agent) |
package.json |
Must have "type": "module" and a "build" script |
| Everything else | Source files the agent can see and modify |
- Create a directory under
evals/(e.g.,evals/nuxt-015-my-eval/) - Add
PROMPT.mdwith a vague, symptom-based task description (don't reveal the solution) - Add broken or suboptimal starter code in
app/for the agent to fix - Add
EVAL.tswith vitest assertions that check for the correct fix and reject anti-patterns - Add
package.jsonwith"type": "module"and"build": "nuxt build" - Run
pnpm run eval— it will automatically run the new eval for all models
- Create a config in
experiments/(e.g.,experiments/my-model.ts) - Add the display name to
MODEL_NAMESinscripts/export-results.ts - Run
pnpm run eval— it will automatically run all evals for the new model
After running evals:
- Export results:
pnpm run export-results - Copy to the nuxt.com repo:
cp agent-results.json <path-to-nuxt.com>/agent-results.json- Commit and deploy the nuxt.com repo
| Eval | Type | Tests |
|---|---|---|
| nuxt-000-fix-data-fetching | fix | Replace onMounted + $fetch with useFetch |
| nuxt-001-prefer-nuxt-link | fix | Replace <a href> with <NuxtLink to> |
| nuxt-002-state-composables | build | State management with useState composable |
| nuxt-003-page-meta | build | Page meta, useHead, and custom layouts |
| nuxt-004-error-handling | build | Error handling with NuxtErrorBoundary |
| nuxt-005-fix-seo-meta | fix | Replace useHead meta arrays with useSeoMeta |
| nuxt-006-runtime-config | build | Runtime config with public vs private keys |
| nuxt-007-avoid-redundant-ref | fix | Replace ref + watch with computed for derived state |
| nuxt-008-fix-exposed-secret | fix | Move private runtimeConfig access to server API route |
| nuxt-009-cache-api-response | fix | Replace defineEventHandler with defineCachedEventHandler |
| nuxt-010-fix-watch-fetch | fix | Replace watch + $fetch with useFetch reactive URL |
| nuxt-011-fix-sequential-fetching | fix | Parallelize sequential await useFetch with Promise.all |
| nuxt-012-nuxt3-to-nuxt4-migration | fix | Migrate Nuxt 3 directory structure to Nuxt 4 |
| nuxt-013-prefer-nuxt-image | fix | Replace raw <img> with NuxtImg + @nuxt/image |
| nuxt-014-prefer-use-cookie | fix | Replace document.cookie with useCookie composable |
| Eval | Type | Tests |
|---|---|---|
| nuxt-content-000-navigation | build | Documentation site with queryCollectionNavigation sidebar |
| nuxt-content-001-data-collection | build | Data collection (type "data") with JSON files |
| Eval | Type | Tests |
|---|---|---|
| nuxt-ui-000-theming | build | Theming with app.config.ts colors and semantic utilities |
| nuxt-ui-001-fix-raw-html-page | fix | Replace raw HTML with UHeader/UFooter/UPageHero/UPageSection |
| nuxt-ui-002-dashboard-layout | build | Dashboard with UDashboardGroup/Sidebar/Panel |
| nuxt-ui-003-fix-raw-form | fix | Replace raw form with UForm + Zod validation |
| nuxt-ui-004-table | build | Data table with UTable, columns, and search |
| nuxt-ui-005-modal | build | Modal overlay with UModal and v-model:open |
| nuxt-ui-006-command-palette | build | Command palette with UCommandPalette and keyboard shortcuts |
| nuxt-ui-007-dropdown-menu | build | Dropdown menu with grouped items, icons, and onSelect |
See LICENSE.