autodoc-gen is a documentation and testing automation tool for software projects. It enables processing large codebases, extracting relevant context using local embeddings, and generating technical documentation, unit tests, and architectural diagrams using a CPU-optimized LLM.
autodoc-gen is a CLI tool that automates software engineering tasks using artificial intelligence:
- Source code chunking: Splits large projects into manageable fragments for LLM processing.
- Semantic indexing: Uses embeddings to represent chunk content.
- Semantic search: Retrieves relevant fragments for each query.
- Automatic generation: Produces documentation, tests, and diagrams based on retrieved context.
- CPU-optimized: Runs lightweight models (
Phi-4 Mini) usingllama-cpp-python.
The project is compatible with WSL Debian and standard Linux distributions.
flowchart LR
A[Source Code Project] --> B[Chunking Module]
B --> C[Embeddings Module]
C --> D[FAISS Semantic Index]
D --> E[Semantic Search Layer]
E --> F[LLM Processing]
F --> G[Documentation]
F --> H[Test Generation]
G --> I[Diagram Generation - Mermaid/PNG]
- Chunking Module: Splits large files into fragments that the model can process.
- Embeddings Module: Converts chunks into semantic vectors (
SentenceTransformers). - FAISS Semantic Index: Enables fast and efficient retrieval of relevant chunks.
- Semantic Search Layer: Retrieves the most relevant fragments based on the query.
- LLM Processing: Generates documentation, tests, and diagrams using retrieved context.
- Outputs: Markdown documentation, Python unit tests, and Mermaid diagrams (optional PNG).
- Project input: Path to the source code.
- Chunking: Fragment files are created with a configurable maximum token size.
- Indexing: Embeddings are generated for each chunk and stored in a FAISS index.
- Semantic query: The user submits queries (docs, tests, diagrams).
- Context retrieval: FAISS returns the most relevant chunks.
- LLM processing: A local model generates documentation, tests, or diagrams.
- Output: Markdown files (
GENERATED_DOCUMENTATION.md), tests (*_test_auto.py), and diagrams (.mmdand optional.png).
- Organizes commands:
chunk,index,search,doc,tests,diagrams. - Handles user parameters and flags.
chunk_project(project_path, max_tokens, output_dir): splits files by token size.
build_embeddings_index(chunks_dir, index_path): generates semantic vectors and builds the FAISS index.
semantic_search(query, index_path, chunks_dir, k): retrieves the most relevant chunks for a query.
call_llm(prompt, max_tokens): calls the CPU-optimized local model (Phi-4 Mini) and returns generated text.
generate_documentation(query, index_path, chunks_dir): produces technical Markdown documentation using retrieved context.
generate_tests(file_path, index_path, chunks_dir): generates unit tests using pytest.
generate_diagrams(query, index_path, chunks_dir, export_png=False): generates Mermaid diagrams and optionally exports them to PNG.
- Chunking to handle projects larger than 56k tokens without memory saturation.
- FAISS for efficient semantic indexing and fast context retrieval.
- Lightweight local models (
Phi-4 Mini) to avoid cloud dependency and support CPU-only environments. - Mermaid for diagrams: easy to render, portable, and Markdown-compatible.
- WSL Debian support:
setup.shandrun.shscripts automate installation and execution. - PEP 668 compliance: installation in a virtual environment to avoid breaking system dependencies.
- WSL Debian or a standard Linux distribution
- Python 3.10+
aptpackage manager- Optional: Node.js (only required for PNG diagram export)
Clone the repository and run the setup script:
git clone https://github.com/andgom97/autodoc-gen.git
cd autodoc-gen
chmod +x setup.sh
./setup.shThis script will:
- Update system packages
- Install required Python system dependencies
- Create and activate a virtual environment
- Install project dependencies from
requirements.txt - Optionally install
mermaid-clifor exporting diagrams to PNG
Activate the virtual environment when needed:
source venv/bin/activateCheck that the CLI is available:
autodoc-gen --helpIf the command is not found, ensure the virtual environment is activated.
./run.sh doc "payment module"./run.sh tests src/payments/controller.py./run.sh diagrams "user module"
./run.sh diagrams "user module" --png./run.sh search "authentication function"./run.sh chunk src/
./run.sh index chunks/Note: All outputs are stored in separate directories:
chunks/→ code fragmentsGENERATED_DOCUMENTATION.md→ documentation*_test_auto.py→ unit testsGENERATED_DIAGRAMS/→ Mermaid and PNG diagrams
If you would like to improve the project, feel free to open a Pull Request or create an Issue on GitHub.
- Author: [Andrés Gómez Alfonso]
- Creation Date: [11, 2025]
- Contact Email: [[email protected]]