#inference #model-serving #training #llm-inference #llm

bin+lib ifran

Local LLM inference, training, and fleet management platform

5 stable releases

Uses new Rust 2024

1.3.0 Apr 4, 2026
1.2.0 Mar 30, 2026
1.1.1 Mar 30, 2026
1.1.0 Mar 29, 2026
1.0.0 Mar 29, 2026

#547 in HTTP server

Download history 68/week @ 2026-03-31 6/week @ 2026-04-14

74 downloads per month

AGPL-3.0-only

2MB
46K SLoC

Ifran

LLM controller for pulling, managing, and training language models. A self-contained product with CLI, REST/gRPC API, and desktop interfaces.

Overview

Ifran is a Rust-based tool that provides:

  • Model pulling from HuggingFace with resume, integrity verification, and catalog tracking
  • Inference through 15 pluggable backends (llama.cpp, Candle, Ollama, vLLM, GGUF, ONNX, TensorRT, TPU, Gaudi, Inferentia, OneAPI, Qualcomm AI 100, Metal, Vulkan, AMD XDNA)
  • Hardware acceleration with auto-detection for CUDA, ROCm, Metal, Vulkan, TPU, Gaudi, Inferentia, OneAPI, Qualcomm AI, and AMD XDNA
  • Training orchestration (LoRA, QLoRA, full fine-tune, DPO, RLHF, distillation) with distributed training support
  • Evaluation benchmarking (MMLU, HellaSwag, HumanEval, perplexity, custom)
  • Experiments — autonomous hyperparameter sweeps with grid/random/Bayesian search
  • Fleet management for multi-node deployments with health monitoring
  • Multi-tenancy with per-tenant API keys, resource isolation, and GPU budget enforcement
  • Marketplace for model publishing, discovery, and peer-to-peer sharing
  • RAG pipelines for document ingestion and retrieval-augmented generation
  • RLHF annotation sessions with DPO export
  • Lineage tracking for full pipeline provenance (dataset → training → evaluation → deployment)
  • OpenAI-compatible API for drop-in replacement (/v1/chat/completions)
  • Desktop app via Tauri v2 + SvelteKit
  • Bidirectional integration with SecureYeoman orchestrator
  • Agnosticos integration — runs as a systemd service with capability registration

Quick Start

# Pull a model
ifran pull meta-llama/Llama-3.1-8B-Instruct --quant q4_k_m

# Run interactive chat
ifran run meta-llama/Llama-3.1-8B-Instruct

# Start the API server
ifran serve

# List local models
ifran list

# Start a training job
ifran train --base-model meta-llama/Llama-3.1-8B-Instruct --dataset ./data.jsonl --method lora

Installation

From Source

git clone [email protected]:MacCracken/ifran.git
cd ifran
cargo build --release

Binaries: target/release/ifran (CLI), target/release/ifran-api (server).

On Agnosticos

pkg install ifran
systemctl start ifran

Installs to /usr/local/bin/, config at /etc/ifran/ifran.toml, data at /var/lib/ifran/.

Configuration

Ifran discovers config in this order:

  1. IFRAN_CONFIG environment variable
  2. ~/.ifran/ifran.toml (user config)
  3. /etc/ifran/ifran.toml (system config, Agnosticos)
  4. Built-in defaults

See deploy/ifran.toml.example for all options.

Default storage: ~/.ifran/ (models, database, cache, checkpoints).

Authentication

Set IFRAN_API_KEY to enable Bearer token authentication on all API endpoints (except /health):

export IFRAN_API_KEY=your-secret-token
ifran serve

Without IFRAN_API_KEY, the API is open (suitable for local development).

Project Structure

crates/
├── ifran-types      # Shared types + protobuf codegen
├── ifran-core       # Model registry, pull engine, lifecycle, hardware
├── ifran-backends   # Pluggable inference backends (trait-based)
├── ifran-train      # Training orchestration
├── ifran-api        # Axum REST + tonic gRPC server
├── ifran-bridge     # SY↔Ifran bidirectional gRPC
├── ifran-cli        # CLI (binary: ifran)
└── ifran-desktop    # Tauri v2 + SvelteKit (desktop app)

Documentation

  • Agnosticos — Target operating system (Rust)
  • SecureYeoman — Orchestrator with cyclic integration (TS/Bun)

Testing

1,406 tests across 7 crates (~73% coverage). CI runs per-package test matrix with coverage via cargo-tarpaulin.

cargo test --workspace

Versioning

Semver: MAJOR.MINOR.PATCH for releases, with optional pre-release suffix (e.g., 1.0.0-rc1).

License

AGPL-3.0

Dependencies

~34–63MB
~1M SLoC