Skip to content

sanand0/blog

Repository files navigation

Anand's Blog

Content and build pipeline for https://s-anand.net/

Source files

Content:

  • pages/: Standalone pages as Markdown (pages/slug.md).
    • Home page
    • Pages can be nested: pages/lists/slug.md
  • posts/: Blog posts as Markdown (posts/yyyy/slug.md).
  • assets/: Converted media files used by posts (WebP/OPUS). Served at /blog/assets/.

Configuration & Code:

  • metadata.yml for taxonomies (categories, tags), and author info
  • hugo.toml: Hugo site configuration
  • setup.sh: Build script to generate content and build site
  • .github/workflows/deploy.yml: Deployment workflow for GitHub Pages.
  • layouts/: Hugo layout overrides for theme customizations.
  • static/: Static files (CSS overrides, favicon assets).
  • themes/PaperMod/: Hugo theme sources (vendored).
  • scripts/: Conversion and utility scripts.
  • justfile: Justfile for local pre-processing (e.g. analysis).
  • analysis/: Data analysis scripts and results (e.g. embedding analysis).

Auto-generated (DO NOT edit!):

  • content/: Contains Hugo content (posts/pages + taxonomy and archive index pages).
  • public/: Build output (deployed to GitHub Pages).

After editing source files, rebuild with:

bash setup.sh

This runs:

  1. scripts/build_content.py - generates content/ from posts/, pages/, and metadata.yml
  2. hugo - builds static site to public/blog/
  3. Post-processing scripts for comments and feed normalization
  4. Copies special pages to public/ root

GitHub automatically runs setup.sh on push to main and deploys public/ to GitHub Pages.

WIP commits are pushed to the live branch. The prod branch holds permanent changes - in case of rollbacks to main.

Embeddings

content/embeddings.parquet contains Gemini Embedding 2 vectors for every page and post.

scripts/embeddings.py                    # embed all new/changed files
scripts/embeddings.py --since 2025-01   # only files modified after a date
scripts/embeddings.py --limit 10        # test run: at most 10 files
scripts/embeddings.py --force           # re-embed all, ignoring hashes
  • Model: gemini-embedding-2-preview, 768 dimensions, RETRIEVAL_DOCUMENT task
  • Each file's title (from frontmatter) is prepended to the body before embedding
  • State is persisted in content/embeddings.duckdb so interrupted runs resume automatically — only files whose content hash changed are re-embedded

Frontmatter

  • classes: wrap-code adds the wrap-code class to the post's main <article> element, which applies CSS to wrap long code blocks.
  • build: { list: never, render: always } ensures that posts/pages are not listed anywhere blog index but are still rendered.
  • robotsNoIndex: true adds a <meta name="robots" content="noindex"> tag to the page header to prevent indexing by search engines.
  • aliases: ["old-path"] adds redirects from old-path to the current page using Hugo Aliases.

About

My blog/website since 1999

Topics

Resources

Stars

Watchers

Forks

Contributors