Content and build pipeline for https://s-anand.net/
Content:
pages/: Standalone pages as Markdown (pages/slug.md).- Home page
- Pages can be nested:
pages/lists/slug.md
posts/: Blog posts as Markdown (posts/yyyy/slug.md).assets/: Converted media files used by posts (WebP/OPUS). Served at/blog/assets/.
Configuration & Code:
metadata.ymlfor taxonomies (categories, tags), and author infohugo.toml: Hugo site configurationsetup.sh: Build script to generate content and build site.github/workflows/deploy.yml: Deployment workflow for GitHub Pages.layouts/: Hugo layout overrides for theme customizations.layouts/partials/head.htmlgenerates<title>by capitalizing (e.g. blog → Blog) and deduplicating site title (e.g. "S Anand | S Anand" -> "S Anand").
static/: Static files (CSS overrides, favicon assets).themes/PaperMod/: Hugo theme sources (vendored).scripts/: Conversion and utility scripts.justfile: Justfile for local pre-processing (e.g. analysis).analysis/: Data analysis scripts and results (e.g. embedding analysis).
Auto-generated (DO NOT edit!):
content/: Contains Hugo content (posts/pages + taxonomy and archive index pages).public/: Build output (deployed to GitHub Pages).
After editing source files, rebuild with:
bash setup.shThis runs:
scripts/build_content.py- generatescontent/fromposts/,pages/, andmetadata.ymlhugo- builds static site topublic/blog/- Post-processing scripts for comments and feed normalization
- Copies special pages to
public/root
GitHub automatically runs setup.sh on push to main and deploys public/ to GitHub Pages.
WIP commits are pushed to the live branch. The prod branch holds permanent changes - in case of rollbacks to main.
content/embeddings.parquet contains Gemini Embedding 2 vectors for every page and post.
scripts/embeddings.py # embed all new/changed files
scripts/embeddings.py --since 2025-01 # only files modified after a date
scripts/embeddings.py --limit 10 # test run: at most 10 files
scripts/embeddings.py --force # re-embed all, ignoring hashes- Model:
gemini-embedding-2-preview, 768 dimensions,RETRIEVAL_DOCUMENTtask - Each file's title (from frontmatter) is prepended to the body before embedding
- State is persisted in
content/embeddings.duckdbso interrupted runs resume automatically — only files whose content hash changed are re-embedded
classes: wrap-codeadds thewrap-codeclass to the post's main<article>element, which applies CSS to wrap long code blocks.build: { list: never, render: always }ensures that posts/pages are not listed anywhere blog index but are still rendered.robotsNoIndex: trueadds a<meta name="robots" content="noindex">tag to the page header to prevent indexing by search engines.aliases: ["old-path"]adds redirects from old-path to the current page using Hugo Aliases.