Skip to content

andgom97/autodoc-gen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Autodoc Generator (autodoc)

autodoc-gen is a documentation and testing automation tool for software projects. It enables processing large codebases, extracting relevant context using local embeddings, and generating technical documentation, unit tests, and architectural diagrams using a CPU-optimized LLM.


1. Module Description

autodoc-gen is a CLI tool that automates software engineering tasks using artificial intelligence:

  • Source code chunking: Splits large projects into manageable fragments for LLM processing.
  • Semantic indexing: Uses embeddings to represent chunk content.
  • Semantic search: Retrieves relevant fragments for each query.
  • Automatic generation: Produces documentation, tests, and diagrams based on retrieved context.
  • CPU-optimized: Runs lightweight models (Phi-4 Mini) using llama-cpp-python.

The project is compatible with WSL Debian and standard Linux distributions.


2. Architecture

flowchart LR
    A[Source Code Project] --> B[Chunking Module]
    B --> C[Embeddings Module]
    C --> D[FAISS Semantic Index]
    D --> E[Semantic Search Layer]
    E --> F[LLM Processing]
    F --> G[Documentation]
    F --> H[Test Generation]
    G --> I[Diagram Generation - Mermaid/PNG]

Loading
  • Chunking Module: Splits large files into fragments that the model can process.
  • Embeddings Module: Converts chunks into semantic vectors (SentenceTransformers).
  • FAISS Semantic Index: Enables fast and efficient retrieval of relevant chunks.
  • Semantic Search Layer: Retrieves the most relevant fragments based on the query.
  • LLM Processing: Generates documentation, tests, and diagrams using retrieved context.
  • Outputs: Markdown documentation, Python unit tests, and Mermaid diagrams (optional PNG).

3. Data Flow

  1. Project input: Path to the source code.
  2. Chunking: Fragment files are created with a configurable maximum token size.
  3. Indexing: Embeddings are generated for each chunk and stored in a FAISS index.
  4. Semantic query: The user submits queries (docs, tests, diagrams).
  5. Context retrieval: FAISS returns the most relevant chunks.
  6. LLM processing: A local model generates documentation, tests, or diagrams.
  7. Output: Markdown files (GENERATED_DOCUMENTATION.md), tests (*_test_auto.py), and diagrams (.mmd and optional .png).

4. Function/Class Responsibilities

CLI (cli.py)

  • Organizes commands: chunk, index, search, doc, tests, diagrams.
  • Handles user parameters and flags.

Chunking (chunker.py)

  • chunk_project(project_path, max_tokens, output_dir): splits files by token size.

Embeddings (embeddings.py)

  • build_embeddings_index(chunks_dir, index_path): generates semantic vectors and builds the FAISS index.

Search (search.py)

  • semantic_search(query, index_path, chunks_dir, k): retrieves the most relevant chunks for a query.

LLM Wrapper (llm.py)

  • call_llm(prompt, max_tokens): calls the CPU-optimized local model (Phi-4 Mini) and returns generated text.

Documentation (doc_generator.py)

  • generate_documentation(query, index_path, chunks_dir): produces technical Markdown documentation using retrieved context.

Tests (test_generator.py)

  • generate_tests(file_path, index_path, chunks_dir): generates unit tests using pytest.

Diagrams (diagram_generator.py)

  • generate_diagrams(query, index_path, chunks_dir, export_png=False): generates Mermaid diagrams and optionally exports them to PNG.

5. Key Technical Decisions

  • Chunking to handle projects larger than 56k tokens without memory saturation.
  • FAISS for efficient semantic indexing and fast context retrieval.
  • Lightweight local models (Phi-4 Mini) to avoid cloud dependency and support CPU-only environments.
  • Mermaid for diagrams: easy to render, portable, and Markdown-compatible.
  • WSL Debian support: setup.sh and run.sh scripts automate installation and execution.
  • PEP 668 compliance: installation in a virtual environment to avoid breaking system dependencies.

6. Setup

Prerequisites

  • WSL Debian or a standard Linux distribution
  • Python 3.10+
  • apt package manager
  • Optional: Node.js (only required for PNG diagram export)

Installation

Clone the repository and run the setup script:

git clone https://github.com/andgom97/autodoc-gen.git
cd autodoc-gen
chmod +x setup.sh
./setup.sh

This script will:

  • Update system packages
  • Install required Python system dependencies
  • Create and activate a virtual environment
  • Install project dependencies from requirements.txt
  • Optionally install mermaid-cli for exporting diagrams to PNG

Virtual Environment

Activate the virtual environment when needed:

source venv/bin/activate

Verify Installation

Check that the CLI is available:

autodoc-gen --help

If the command is not found, ensure the virtual environment is activated.


7. Usage Examples

Generate documentation

./run.sh doc "payment module"

Generate unit tests

./run.sh tests src/payments/controller.py

Generate diagrams

./run.sh diagrams "user module"
./run.sh diagrams "user module" --png

Search for relevant chunks

./run.sh search "authentication function"

Chunk and index a full project

./run.sh chunk src/
./run.sh index chunks/

Note: All outputs are stored in separate directories:

  • chunks/ → code fragments
  • GENERATED_DOCUMENTATION.md → documentation
  • *_test_auto.py → unit tests
  • GENERATED_DIAGRAMS/ → Mermaid and PNG diagrams

8. Contributions

If you would like to improve the project, feel free to open a Pull Request or create an Issue on GitHub.

9. Author & Contact

  • Author: [Andrés Gómez Alfonso]
  • Creation Date: [11, 2025]
  • Contact Email: [[email protected]]

About

autodoc-gen is a CPU-optimized CLI tool that automates technical documentation, unit test generation, and architectural diagrams for large software projects. It leverages local embeddings, semantic search, and lightweight LLMs to analyze source code and generate high-quality engineering artifacts without relying on cloud services.

Resources

Stars

Watchers

Forks

Contributors