Autodoc Generator (autodoc)

autodoc-gen is a documentation and testing automation tool for software projects. It enables processing large codebases, extracting relevant context using local embeddings, and generating technical documentation, unit tests, and architectural diagrams using a CPU-optimized LLM.

1. Module Description

autodoc-gen is a CLI tool that automates software engineering tasks using artificial intelligence:

Source code chunking: Splits large projects into manageable fragments for LLM processing.
Semantic indexing: Uses embeddings to represent chunk content.
Semantic search: Retrieves relevant fragments for each query.
Automatic generation: Produces documentation, tests, and diagrams based on retrieved context.
CPU-optimized: Runs lightweight models (Phi-4 Mini) using llama-cpp-python.

The project is compatible with WSL Debian and standard Linux distributions.

2. Architecture

flowchart LR
    A[Source Code Project] --> B[Chunking Module]
    B --> C[Embeddings Module]
    C --> D[FAISS Semantic Index]
    D --> E[Semantic Search Layer]
    E --> F[LLM Processing]
    F --> G[Documentation]
    F --> H[Test Generation]
    G --> I[Diagram Generation - Mermaid/PNG]

Chunking Module: Splits large files into fragments that the model can process.
Embeddings Module: Converts chunks into semantic vectors (SentenceTransformers).
FAISS Semantic Index: Enables fast and efficient retrieval of relevant chunks.
Semantic Search Layer: Retrieves the most relevant fragments based on the query.
LLM Processing: Generates documentation, tests, and diagrams using retrieved context.
Outputs: Markdown documentation, Python unit tests, and Mermaid diagrams (optional PNG).

3. Data Flow

Project input: Path to the source code.
Chunking: Fragment files are created with a configurable maximum token size.
Indexing: Embeddings are generated for each chunk and stored in a FAISS index.
Semantic query: The user submits queries (docs, tests, diagrams).
Context retrieval: FAISS returns the most relevant chunks.
LLM processing: A local model generates documentation, tests, or diagrams.
Output: Markdown files (GENERATED_DOCUMENTATION.md), tests (*_test_auto.py), and diagrams (.mmd and optional .png).

4. Function/Class Responsibilities

CLI (`cli.py`)

Organizes commands: chunk, index, search, doc, tests, diagrams.
Handles user parameters and flags.

Chunking (`chunker.py`)

chunk_project(project_path, max_tokens, output_dir): splits files by token size.

Embeddings (`embeddings.py`)

build_embeddings_index(chunks_dir, index_path): generates semantic vectors and builds the FAISS index.

Search (`search.py`)

semantic_search(query, index_path, chunks_dir, k): retrieves the most relevant chunks for a query.

LLM Wrapper (`llm.py`)

call_llm(prompt, max_tokens): calls the CPU-optimized local model (Phi-4 Mini) and returns generated text.

Documentation (`doc_generator.py`)

generate_documentation(query, index_path, chunks_dir): produces technical Markdown documentation using retrieved context.

Tests (`test_generator.py`)

generate_tests(file_path, index_path, chunks_dir): generates unit tests using pytest.

Diagrams (`diagram_generator.py`)

generate_diagrams(query, index_path, chunks_dir, export_png=False): generates Mermaid diagrams and optionally exports them to PNG.

5. Key Technical Decisions

Chunking to handle projects larger than 56k tokens without memory saturation.
FAISS for efficient semantic indexing and fast context retrieval.
Lightweight local models (Phi-4 Mini) to avoid cloud dependency and support CPU-only environments.
Mermaid for diagrams: easy to render, portable, and Markdown-compatible.
WSL Debian support: setup.sh and run.sh scripts automate installation and execution.
PEP 668 compliance: installation in a virtual environment to avoid breaking system dependencies.

6. Setup

Prerequisites

WSL Debian or a standard Linux distribution
Python 3.10+
apt package manager
Optional: Node.js (only required for PNG diagram export)

Installation

Clone the repository and run the setup script:

git clone https://github.com/andgom97/autodoc-gen.git
cd autodoc-gen
chmod +x setup.sh
./setup.sh

This script will:

Update system packages
Install required Python system dependencies
Create and activate a virtual environment
Install project dependencies from requirements.txt
Optionally install mermaid-cli for exporting diagrams to PNG

Virtual Environment

Activate the virtual environment when needed:

source venv/bin/activate

Verify Installation

Check that the CLI is available:

autodoc-gen --help

If the command is not found, ensure the virtual environment is activated.

7. Usage Examples

Generate documentation

./run.sh doc "payment module"

Generate unit tests

./run.sh tests src/payments/controller.py

Generate diagrams

./run.sh diagrams "user module"
./run.sh diagrams "user module" --png

Search for relevant chunks

./run.sh search "authentication function"

Chunk and index a full project

./run.sh chunk src/
./run.sh index chunks/

Note: All outputs are stored in separate directories:

chunks/ → code fragments
GENERATED_DOCUMENTATION.md → documentation
*_test_auto.py → unit tests
GENERATED_DIAGRAMS/ → Mermaid and PNG diagrams

8. Contributions

If you would like to improve the project, feel free to open a Pull Request or create an Issue on GitHub.

9. Author & Contact

Author: [Andrés Gómez Alfonso]
Creation Date: [11, 2025]
Contact Email: [[email protected]]

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
prompts		prompts
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run.sh		run.sh
setup.py		setup.py
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Autodoc Generator (autodoc)

1. Module Description

2. Architecture

3. Data Flow

4. Function/Class Responsibilities

CLI (`cli.py`)

Chunking (`chunker.py`)

Embeddings (`embeddings.py`)

Search (`search.py`)

LLM Wrapper (`llm.py`)

Documentation (`doc_generator.py`)

Tests (`test_generator.py`)

Diagrams (`diagram_generator.py`)

5. Key Technical Decisions

6. Setup

Prerequisites

Installation

Virtual Environment

Verify Installation

7. Usage Examples

Generate documentation

Generate unit tests

Generate diagrams

Search for relevant chunks

Chunk and index a full project

8. Contributions

9. Author & Contact

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Autodoc Generator (autodoc)

1. Module Description

2. Architecture

3. Data Flow

4. Function/Class Responsibilities

CLI (cli.py)

Chunking (chunker.py)

Embeddings (embeddings.py)

Search (search.py)

LLM Wrapper (llm.py)

Documentation (doc_generator.py)

Tests (test_generator.py)

Diagrams (diagram_generator.py)

5. Key Technical Decisions

6. Setup

Prerequisites

Installation

Virtual Environment

Verify Installation

7. Usage Examples

Generate documentation

Generate unit tests

Generate diagrams

Search for relevant chunks

Chunk and index a full project

8. Contributions

9. Author & Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages

CLI (`cli.py`)

Chunking (`chunker.py`)

Embeddings (`embeddings.py`)

Search (`search.py`)

LLM Wrapper (`llm.py`)

Documentation (`doc_generator.py`)

Tests (`test_generator.py`)

Diagrams (`diagram_generator.py`)