Rehan Malik rehan243

About Me — AI/ML Engineer | Generative AI | LLM Systems

i'm an AI/ML engineer based in the US, currently building production AI systems at Reallytics.ai and Verticiti. most of my work revolves around getting large language models to do useful things in production — not toy demos, actual systems handling real traffic.

before this, i spent years at Afiniti and Cloud Kinetics doing the grunt work of making ML models reliable at scale. fraud detection, voice analytics, enterprise search — the kind of stuff that breaks at 3am and you have to fix.

what keeps me going: that moment when an AI agent you built actually solves a problem you didn't explicitly program it for. still hits different every time.

right now i'm deep into:

multi-agent systems that coordinate without falling apart
RAG pipelines that actually find what you're looking for
writing daily about what i learn — AI Engineering Notes

developer coding animation — AI engineer at work

Featured Projects — AI Agents, RAG, LLM Fine-Tuning

Agentic AI Workflows — Production AI Agents 8 specialized AI agents with LangChain + OpenAI function calling. multi-agent orchestration with planning loops and guardrails. the project i'm most excited about.	RAG Enterprise Search — Retrieval-Augmented Generation production retrieval pipeline over 2TB+ data. LangChain, FAISS, ChromaDB, cross-encoder re-ranking. deployed on AWS SageMaker.
Voice AI Platform — Real-Time Speech AI real-time voice infrastructure handling 500+ concurrent calls. WebSockets, Apache Kafka, gRPC with CUDA. speech-to-text, sentiment analysis.	LLM Fine-Tuning (LoRA/QLoRA) — Parameter-Efficient Fine-Tuning fine-tuning LLaMA-2 and Mistral with LoRA/QLoRA/PEFT. 40% cost reduction vs hosted APIs. vLLM serving on SageMaker.
RLHF LLM Optimization — Reinforcement Learning from Human Feedback full RLHF pipeline — supervised fine-tuning, reward modeling, PPO with KL constraints. 68% win rate, 96% safety compliance.	Sentinel Fraud Detection — Explainable AI ensemble XGBoost + Isolation Forest with 650+ engineered features. SHAP explainability, UMAP clustering, GenAI reports via Amazon Bedrock.

Tech Stack — Python, PyTorch, LangChain, AWS, Docker

i'm not going to pretend i use everything equally. here's what i actually reach for day-to-day:

the full picture (click to expand)


daily drivers	Python, PyTorch, FastAPI, Docker, Git, VS Code
LLM & GenAI	LangChain, LlamaIndex, HuggingFace Transformers, vLLM, PEFT/LoRA/QLoRA
vector & data	FAISS, ChromaDB, Pinecone, PostgreSQL, MongoDB, Redis, Kafka, Elasticsearch
cloud & MLOps	AWS (SageMaker, Bedrock, Lambda, ECS), GCP Vertex AI, Azure OpenAI
ML frameworks	TensorFlow, scikit-learn, XGBoost, LightGBM, ONNX
infrastructure	Kubernetes, Terraform, GitHub Actions, MLflow, Weights & Biases

GitHub Stats

i commit a lot. sometimes it's good code, sometimes it's "fix: typo in typo fix".

GitHub Trophies

Contribution Activity Graph

Contribution Snake Animation

Latest AI Research Articles

i publish research notes daily — not polished papers, just honest writeups of what i'm learning and building. think of it as a public lab notebook for generative AI, LLM fine-tuning, RAG, and agentic systems.

Streaming Model Inference For Real Time Applicatio _2026-04-24	Fine Tuned Llms For Enterprise Retrieval Augmented _2026-04-24
Explainable Ai Xai For Trustworthy Models _2026-04-23	Edge Ai For Real Time Inference _2026-04-23

📚 View all articles →

Recent Open-Source Activity

📝 Opened issue [Feature] Automatic LoRA rank recommendation based on datase in axolotl-ai-cloud/axolotl _(2026-04-24)

💬 Commented on Integrate SAM3-LiteText to Ultralytics in ultralytics/ultralytics _(2026-04-24)

💬 Commented on Crazy Logging I want to shut it down in NVIDIA-NeMo/NeMo _(2026-04-24)

💬 Commented on Regression in 1.1.7 (#7498): Second regenerate from latest c in langchain-ai/langgraph _(2026-04-24)

💬 Commented on Issue with the custom nodes. in modal-labs/modal-examples _(2026-04-24)

💬 Commented on [Model Request] Support Gemma4 in mlc-ai/mlc-llm _(2026-04-24)

💬 Commented on AuxiliaryTrainingWrapper.forward requires positional x, br in huggingface/peft _(2026-04-24)

💬 Commented on Error: Gemini 3 Pro - Unknown error in continuedev/continue _(2026-04-24)

Currently Researching

topics discovered daily by a multi-model AI research engine (GPT-4.1, Grok-3, DeepSeek R1, Llama-4)

🔬 Efficient Model Serving with Quantization and Distillation

🔬 Streaming Model Inference for Real-Time Applications

🔬 Fine-Tuned LLMs for Enterprise Retrieval-Augmented Generation (RAG)

🔬 Synthetic Data Generation for ML Training

🔬 Explainable AI (XAI) for Trustworthy Models

🔬 Edge AI for Real-Time Inference

Code Snippets & Gists

📌 Async Retry Pattern with Exponential Backoff — Production Pattern (Python) _(2026-04-24)

📌 RAG Relevance Scorer using Cross-Encoder — Production Pattern (Python) _(2026-04-23)

_{🤖 Profile auto-updated on 2026-04-24 19:03 UTC}

_{if you made it this far, you should probably just say hi}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly