🦉 OWL Evaluation Framework

A comprehensive platform for evaluating and comparing world models through human evaluation studies

Features • Quick Start • Documentation • Contributing

🎯 Overview

OWL (Wayfarer Labs) Evaluation Framework is a modern, production-ready platform designed for researchers and organizations conducting human evaluation studies of generative world models. Built with TypeScript and Next.js, it provides a complete solution for comparing video outputs through structured A/B testing and multi-dimensional analysis.

Why OWL?

🔬 Research-Grade Evaluations: Structured evaluation across multiple dimensions (quality, controllability, visual fidelity, temporal consistency)
🌐 Scalable Human Studies: Seamless integration with Prolific for large-scale crowd-sourced evaluations
🏢 Enterprise Ready: Multi-tenant architecture with organization management and role-based access control
⚡ Developer Friendly: Unified TypeScript CLI and comprehensive API for automation
📊 Real-time Analytics: Built-in dashboard with progress tracking and performance visualization

✨ Features

Core Capabilities

🎬 Video Comparison Workflows - Side-by-side video playback with synchronized controls
📊 Multi-dimensional Evaluation - Structured rating across research-validated dimensions
👥 Human Study Management - Complete participant workflow with screening and quality control
🏗️ Multi-tenant Architecture - Organization-based isolation with RBAC
⚡ Real-time Analytics - Live progress tracking and performance dashboards

Platform Integrations

🔗 Prolific Integration - Automated participant recruitment and payment processing
☁️ Cloud Storage - AWS S3 and Tigris integration for video asset management
🔐 Authentication - Stack Auth integration with social login support
📡 REST API - Complete API for programmatic access and automation

Developer Experience

🛠️ Unified CLI - evalctl command-line tool for experiment management
🔄 Docker Support - Containerized deployment with docker-compose
🧪 Testing Suite - Comprehensive test coverage with Jest
📚 Type Safety - Full TypeScript coverage across frontend and backend

🚀 Quick Start

Prerequisites

Node.js 18+
PostgreSQL database
(Optional) AWS S3 or Tigris for video storage

Installation

Clone and install dependencies:

git clone https://github.com/Wayfarer-Labs/owl-eval.git
cd owl-eval/eval/frontend
npm install

Set up your environment:

cp .env.example .env.local
# Edit .env.local with your database and storage configuration

Initialize the database:
```
npm run db:migrate
```
Start the development server:
```
npm run dev
```
Use the CLI for experiment management:
```
cd ../
./evalctl --help
```

Docker Deployment

For production deployment:

docker-compose up -d

Visit http://localhost:3000 to access the web interface.

🏗️ Architecture

Tech Stack

Frontend: Next.js 15, React 19, TypeScript, Tailwind CSS
Backend: Next.js API routes, Prisma ORM
Database: PostgreSQL
Authentication: Stack Auth
Storage: AWS S3 / Tigris
CLI: TypeScript with Commander.js

Project Structure

eval/
├── frontend/           # Next.js web application
│   ├── src/app/       # Pages and API routes  
│   ├── src/components/ # Reusable UI components
│   └── src/lib/       # Core business logic
├── scripts/           # TypeScript CLI tools
├── evalctl           # Main CLI executable
└── docker-compose.yml # Development environment

docs/                  # Comprehensive documentation
├── concepts.md        # Core concepts and methodology
├── evaluation-system.md # System architecture
├── prolific-integration.md # Platform integrations
└── contributing.md    # Development guidelines

📚 Documentation

Getting Started

Core Concepts - Understanding the evaluation methodology and data model
System Architecture - Deep dive into platform architecture and workflows
CLI Reference - Complete command-line tool documentation

Development

Frontend Development - Setup and development guidelines
Contributing Guide - How to contribute to the project
Testing Guide - Running and writing tests

Integrations

Prolific Integration - Setting up human evaluation studies
Multi-tenant Setup - Organization and user management

🤝 Contributing

We welcome contributions from the community! Whether you're fixing bugs, adding features, or improving documentation, your help is appreciated.

How to Contribute

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes and add tests
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

See our Contributing Guide for detailed development setup and guidelines.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙋‍♀️ Support

📖 Documentation: Check the docs/ directory for comprehensive guides
🐛 Issues: Report bugs and request features via GitHub Issues
💬 Discussions: Join community discussions in GitHub Discussions

Built with ❤️ by Wayfarer Labs

Name		Name	Last commit message	Last commit date
Latest commit History 160 Commits
.github		.github
docs		docs
eval		eval
.dockerignore		.dockerignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
config.ts		config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🦉 OWL Evaluation Framework

🎯 Overview

Why OWL?

✨ Features

Core Capabilities

Platform Integrations

Developer Experience

🚀 Quick Start

Prerequisites

Installation

Docker Deployment

🏗️ Architecture

Tech Stack

Project Structure

📚 Documentation

Getting Started

Development

Integrations

🤝 Contributing

How to Contribute

📄 License

🙋‍♀️ Support

About

Uh oh!

Releases 3

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🦉 OWL Evaluation Framework

🎯 Overview

Why OWL?

✨ Features

Core Capabilities

Platform Integrations

Developer Experience

🚀 Quick Start

Prerequisites

Installation

Docker Deployment

🏗️ Architecture

Tech Stack

Project Structure

📚 Documentation

Getting Started

Development

Integrations

🤝 Contributing

How to Contribute

📄 License

🙋‍♀️ Support

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages