A comprehensive platform for evaluating and comparing world models through human evaluation studies
OWL (Wayfarer Labs) Evaluation Framework is a modern, production-ready platform designed for researchers and organizations conducting human evaluation studies of generative world models. Built with TypeScript and Next.js, it provides a complete solution for comparing video outputs through structured A/B testing and multi-dimensional analysis.
- 🔬 Research-Grade Evaluations: Structured evaluation across multiple dimensions (quality, controllability, visual fidelity, temporal consistency)
- 🌐 Scalable Human Studies: Seamless integration with Prolific for large-scale crowd-sourced evaluations
- 🏢 Enterprise Ready: Multi-tenant architecture with organization management and role-based access control
- ⚡ Developer Friendly: Unified TypeScript CLI and comprehensive API for automation
- 📊 Real-time Analytics: Built-in dashboard with progress tracking and performance visualization
- 🎬 Video Comparison Workflows - Side-by-side video playback with synchronized controls
- 📊 Multi-dimensional Evaluation - Structured rating across research-validated dimensions
- 👥 Human Study Management - Complete participant workflow with screening and quality control
- 🏗️ Multi-tenant Architecture - Organization-based isolation with RBAC
- ⚡ Real-time Analytics - Live progress tracking and performance dashboards
- 🔗 Prolific Integration - Automated participant recruitment and payment processing
- ☁️ Cloud Storage - AWS S3 and Tigris integration for video asset management
- 🔐 Authentication - Stack Auth integration with social login support
- 📡 REST API - Complete API for programmatic access and automation
- 🛠️ Unified CLI -
evalctlcommand-line tool for experiment management - 🔄 Docker Support - Containerized deployment with docker-compose
- 🧪 Testing Suite - Comprehensive test coverage with Jest
- 📚 Type Safety - Full TypeScript coverage across frontend and backend
- Node.js 18+
- PostgreSQL database
- (Optional) AWS S3 or Tigris for video storage
-
Clone and install dependencies:
git clone https://github.com/Wayfarer-Labs/owl-eval.git cd owl-eval/eval/frontend npm install -
Set up your environment:
cp .env.example .env.local # Edit .env.local with your database and storage configuration -
Initialize the database:
npm run db:migrate
-
Start the development server:
npm run dev
-
Use the CLI for experiment management:
cd ../ ./evalctl --help
For production deployment:
docker-compose up -dVisit http://localhost:3000 to access the web interface.
- Frontend: Next.js 15, React 19, TypeScript, Tailwind CSS
- Backend: Next.js API routes, Prisma ORM
- Database: PostgreSQL
- Authentication: Stack Auth
- Storage: AWS S3 / Tigris
- CLI: TypeScript with Commander.js
eval/
├── frontend/ # Next.js web application
│ ├── src/app/ # Pages and API routes
│ ├── src/components/ # Reusable UI components
│ └── src/lib/ # Core business logic
├── scripts/ # TypeScript CLI tools
├── evalctl # Main CLI executable
└── docker-compose.yml # Development environment
docs/ # Comprehensive documentation
├── concepts.md # Core concepts and methodology
├── evaluation-system.md # System architecture
├── prolific-integration.md # Platform integrations
└── contributing.md # Development guidelines
- Core Concepts - Understanding the evaluation methodology and data model
- System Architecture - Deep dive into platform architecture and workflows
- CLI Reference - Complete command-line tool documentation
- Frontend Development - Setup and development guidelines
- Contributing Guide - How to contribute to the project
- Testing Guide - Running and writing tests
- Prolific Integration - Setting up human evaluation studies
- Multi-tenant Setup - Organization and user management
We welcome contributions from the community! Whether you're fixing bugs, adding features, or improving documentation, your help is appreciated.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes and add tests
- Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
See our Contributing Guide for detailed development setup and guidelines.
This project is licensed under the MIT License - see the LICENSE file for details.
- 📖 Documentation: Check the
docs/directory for comprehensive guides - 🐛 Issues: Report bugs and request features via GitHub Issues
- 💬 Discussions: Join community discussions in GitHub Discussions
Built with ❤️ by Wayfarer Labs