Tags · gogpu/gg

v0.43.1

v0.43.1: blit-only fix, type-safe GPU handles (ADR-018), overlay fix (#…

…268)

* chore(deps): update wgpu v0.26.2 → v0.26.3

* feat(gpu): single command buffer compositor (ADR-017, Flutter pattern)

Multiple render sessions can record render passes into one shared command
encoder via SetSharedEncoder(). One Finish + Submit per frame — zero
Vulkan semaphore conflicts by design.

- SetSharedEncoder() on Context (public, any type for duck typing)
- SetSharedEncoder() on GPURenderContext (any → *wgpu.CommandEncoder)
- encodeToEncoder() in render session (surface pass, no submit)
- encodeBlitToEncoder() in render session (blit pass, no submit)
- RenderFrameGrouped accepts optional sharedEncoder parameter
- Backward compatible: nil encoder = existing per-context submit path

ADR-017. Enterprise ref: Flutter Impeller single command buffer per frame.

* feat(gpu): CreateSharedEncoder + SubmitSharedEncoder — complete lifecycle API

Complete single-command-buffer lifecycle for ui integration:
  CreateSharedEncoder() → create encoder
  SetSharedEncoder()    → set on each context
  FlushGPUWithView()    → records render pass (no submit)
  SubmitSharedEncoder() → finish + submit once

Duck-typed public API (any) to avoid wgpu import at root level.
Internal type assertion to *wgpu.CommandEncoder in GPURenderContext.

* docs: mark v0.43.1 in CHANGELOG + ROADMAP

* fix(gpu): nil-guard session in CreateEncoder/SubmitEncoder

Session may be nil if GPU is not yet initialized. Prevents nil pointer
dereference when CreateSharedEncoder is called before first GPU operation.

* chore(deps): update wgpu v0.26.3 → v0.26.4 (VK-006 PRESENT_SRC_KHR fix)

wgpu v0.26.4 adds automatic PRESENT_SRC_KHR layout transition before
vkQueuePresentKHR — fixes flickering with blit-only and shared encoder
paths where no render pass transitions the swapchain image.

* chore: fix formatting + update CHANGELOG deps to wgpu v0.26.4

* fix(gpu): blit-only black screen + resource leak + examples update

Blit-only compositor path (ADR-016) produced black screen because
RenderFrameGrouped early-returned on totalItems==0 without checking
baseLayer — silently skipping the entire blit render pass for frames
with only a base layer texture and zero vector shapes.

Also fixed GPU texture resource leak: buildGPUTextureResources allocated
new vertex/uniform buffers every frame without releasing previous ones.
Now uses session-level persistent buffers with grow-only reallocation
(same pattern as SDF/convex/image/text tiers).

- fix: totalItems==0 guard now checks baseLayer (render_session.go:523)
- fix: session-level gpuTexVertBuf/gpuTexUniformBufs/gpuTexBindGroups
- add: examples/blit_only/ — standalone non-MSAA compositor demo
- deps: all examples updated to gogpu v0.29.2 + wgpu v0.26.4
- docs: CHANGELOG, README (compositor examples), ROADMAP v0.43.1

* feat(gpu): type-safe GPU handles — zero any in pipeline API (ADR-018)

Replace any with gpucontext.TextureView and gpucontext.CommandEncoder opaque
handles (struct{ptr unsafe.Pointer}) across the entire GPU pipeline public API.

Compile-time type safety: FlushGPUWithView, SetSharedEncoder, CreateSharedEncoder,
SubmitSharedEncoder, RenderTarget.SurfaceView, RenderDirect — all typed.
TextureView cannot be confused with CommandEncoder or other resource types.

8 bytes value type, zero allocations, GC-safe. Nil checks via .IsNil().
Follows Vulkan/Ebitengine/Go Protobuf Opaque pattern.

Breaking: view any -> gpucontext.TextureView, encoder any -> gpucontext.CommandEncoder.
Requires gpucontext v0.15.0.

* chore: add nolint:gosec for ADR-018 unsafe.Pointer conversions

* chore(deps): update gpucontext v0.15.0 + examples gogpu v0.29.3

Cascade release: gpucontext v0.15.0 (type-safe handles) and gogpu v0.29.3
(typed SurfaceView) are now published. Update all go.mod references.

* test(gpu): enterprise GPU texture overlay tests (13 tests)

Covers vertex data positioning, ortho projection, command queueing,
PendingCount with baseLayer/overlay combinations, isBlitOnly detection,
and RenderFrameGrouped guard regression (BUG-GG-BLIT-PATH-001).

Verifies overlay at (100,100,48,48) produces correct vertex coords
and that ortho matrix uses main viewport, not overlay texture size.

* fix(gpu): overlay texture stretched to full screen (BUG-GG-GPU-TEXTURE-OVERLAY-SIZE)

Root cause: buildGPUTextureResources used a single shared vertex buffer
(s.gpuTexVertBuf) for both base layer and overlay textures. Base layer
(full-screen quad 0,0,600,400) overwrote overlay vertices (48x48 at 100,100).

Fix: separate vertex buffers — s.gpuTexVertBuf for overlays,
s.gpuTexBaseVertBuf for base layer. isBaseLayer parameter selects buffer.

Regression test: TestBuildGPUTextureResources_SeparateVertexBuffers
verifies overlay and base layer never share a vertex buffer.

* docs: update CHANGELOG + ROADMAP with overlay fix and enterprise tests

Apr 25, 2026
59c5ad9
zip
tar.gz
Notes

v0.43.0

feat(gpu): zero-readback compositor pipeline (ADR-015/016, v0.43.0) (#…

…266)

* feat(ggcanvas): FlushPixmap() for zero-readback rendering (ADR-006)

FlushPixmap() uploads CPU pixmap to GPU texture without calling FlushGPU().
Pending GPU shapes remain queued for the caller to flush directly to the
surface via FlushGPUWithView(), eliminating the GPU→CPU→GPU readback.

Flush() refactored to delegate: FlushGPU() + FlushPixmap().

Enables ui ADR-006 Phase 1: spinner GPU cost 10% → <5% by skipping
full-pixmap MSAA render + resolve + CopyTextureToBuffer + fence wait.

4 new tests: unit, closed canvas, Flush/FlushPixmap consistency, partial upload.

* feat(gpu): DrawGPUTextureBase — compositor base layer (ADR-015)

Single-pass compositor: base layer textured quad drawn BEFORE all GPU
tiers (SDF, convex, stencil, images, text, glyph mask) in the render
pass. Enables zero-readback rendering where CPU pixmap is background
and GPU shapes render on top. Flutter OffsetLayer pattern.

- DrawGPUTextureBase() public API (context_image.go)
- QueueBaseLayer() in GPURenderContext (last call wins)
- Base layer rendering in encodeSubmitSurfaceGrouped/ReadbackGrouped
- PendingCount includes base layer
- 4 new base layer tests
- Fixed 2 compute mode tests (wrong CanCompute assumption)

* feat: BeginGPUFrame() for persistent context reuse (RepaintBoundary)

Resets per-context GPU frame state (frameRendered, lastView, clipRect)
so the next render pass uses LoadOpClear instead of LoadOpLoad.

Required when a persistent gg.Context renders to the same view across
frames — without this, stale content from the previous frame bleeds
through via LoadOpLoad. Not needed for one-shot contexts or when the
view changes between frames (auto-reset on view change).

Verified: BeginAcceleratorFrame() only resets the default context,
not per-context GPURenderContexts (ADR-013). UI agent correctly
identified the gap.

* feat(gpu): non-MSAA compositor fast path (ADR-016)

When a frame contains only textured quads (base layer + overlays) with
zero vector shapes, use a 1x render pass directly to swapchain instead
of 4x MSAA render + resolve. 93% bandwidth reduction at 1080p.

- isBlitOnly(): detects blit-only frames (no SDF/convex/stencil/text)
- encodeBlitOnlyPass(): 1x render pass, no MSAA, no depth/stencil
- ensureBlitPipeline(): SampleCount=1 pipeline variant (lazy init)
- RecordBlitDraws(): draws quads using blit pipeline
- 6 new tests for blit-only detection (base only, SDF, text, overlay)

Enterprise pattern: Flutter Impeller, Chrome cc, Qt6 RHI all use
non-MSAA compositor passes for textured quad compositing.

* feat(gpu): FlushGPUWithViewDamage — scissor-clipped compositor (ADR-016 Phase 2)

Damage-aware compositor: when damageRect is non-empty, uses LoadOpLoad
(preserve previous frame) and scissor-clips to the dirty region. Only
the damaged pixels are re-composited — 48x48 spinner updates 9KB vs
8MB full surface at 1080p.

- FlushGPUWithViewDamage() on Context (public API)
- DamageRect field on GPURenderTarget
- encodeBlitOnlyPass respects damage rect (LoadOpLoad + scissor)
- Full surface blit when damageRect is empty (existing behavior)

* feat: FillRectCPU + Pixmap.FillRect — CPU-only rect fill (ADR-016)

CPU-only rectangle fill that bypasses GPU SDF accelerator. Without this,
dirty-region background clearing in ui routes through SDF → blocks
non-MSAA blit path (isBlitOnly always false).

- Pixmap.FillRect: direct pixel fill, row-copy optimized, bounds clamped
- Context.FillRectCPU: device-scale aware, flushes GPU first for z-order
- 6 new tests: fill correctness, clamping, out-of-bounds, genID, no SDF
- Fixed UI agent's empty rect test (image.Rect auto-canonicalizes)
- Cleaned leftover log import from render_session.go

* fix(test): GPU texture overlays are blit-only (no MSAA needed)

isBlitOnly correctly allows GPU texture overlays (RepaintBoundary cached
textures) in the non-MSAA fast path — they are textured quads, same as
the base layer. Test was incorrectly expecting NOT blit-only.

* chore(deps): update wgpu v0.25.7 → v0.26.1 (PresentWithDamage)

wgpu v0.26.1 adds damage-aware surface presentation on all backends:
Software (BitBlt/XPutImage), Vulkan (VK_KHR_incremental_present),
DX12 (Present1), GLES (eglSwapBuffersWithDamageKHR).

gg does not use PresentWithDamage directly — it's called by gogpu.
This dep update ensures go.mod reflects the version used via go.work.

* feat(ggcanvas): PixmapTextureView() for single-pass zero-readback (ADR-015)

Duck-typed accessor: returns gpucontext.TextureView from the pixmap GPU
texture for DrawGPUTextureBase single-pass compositing. Uses Go structural
typing — no gogpu import, calls Texture.TextureView() if available.

Enables the complete zero-readback pipeline:
  FlushPixmap → PixmapTextureView → DrawGPUTextureBase → FlushGPUWithView

4 new tests: before flush, pending texture, promoted, closed canvas.
Requires gogpu Texture.TextureView() (already implemented).

* feat(ggcanvas): EnsureGPUTexture + PixmapTextureView — zero-readback setup

EnsureGPUTexture promotes pendingTexture to real GPU texture (one-time).
PixmapTextureView returns gpucontext.TextureView via duck typing.
Together they enable: FlushPixmap → PixmapTextureView → DrawGPUTextureBase.

4 new tests. Performance investigation pending for standalone pipeline.

* chore(deps): update wgpu v0.26.1 → v0.26.2 (Buffer/BindGroup auto-cleanup)

wgpu v0.26.2 adds runtime.AddCleanup for Buffer and BindGroup — automatic
deferred destruction via GC prevents per-frame resource leaks that caused
stuttering in the manual zero-readback pipeline.

* fix: warn on global GPU fallback in multi-context scenarios

Add warnGPUFallback() — one-time slog.Warn when GPU operations fall back
to global SDFAccelerator.defaultCtx instead of per-context GPURenderContext.
Covers: tryGPUFill, tryGPUStroke, FlushGPU, FlushGPUWithView,
FlushGPUWithViewDamage, flushGPUAccelerator, tryGPUText, tryGPUGlyphMaskText.

In multi-context (RepaintBoundary), this fallback causes shape leaking.
The warning makes it immediately visible in logs instead of silent corruption.

* refactor: type gpuCtx as gpuContextOps instead of any

Replace untyped `gpuCtx any` with `gpuCtx gpuContextOps` for compile-time
type safety. The gpuContextOps interface is defined in the same package —
no circular import. Type assertion moved to ensureGPUCtx (once at creation)
instead of every gpuCtxOps() call. gpuCtxOps() simplified to direct return.

* docs: complete CHANGELOG for v0.43.0 — all features, fixes, deps

Added missing entries: EnsureGPUTexture, GPU fallback warnings,
gpuCtx typing refactor, wgpu v0.26.2 dep update.

* fix(gpu): separate blit pipeline layout — single bind group (no clip)

Blit pipeline used pipeLayout with 2 bind groups (texture + clip), but
RecordBlitDraws only sets group 0 — leaving group 1 undefined causes
GPU validation errors. New blitLayout with single bind group for the
non-MSAA compositor fast path.

* chore(deps): update gogpu v0.28.3 → v0.29.0 in gogpu_integration example

* feat: zero-readback compositor examples (ADR-015/016)

Two standalone examples demonstrating compositor APIs:

- zero_readback/ — standard RenderDirect path (GPU-direct, smooth)
- zero_readback_manual/ — manual FlushPixmap + DrawGPUTextureBase pipeline
  (CPU/GPU content separation, used by ui/desktop.go)

Both with separate go.mod, README, gogpu v0.29.0.

* chore: mark v0.43.0 in CHANGELOG

Apr 25, 2026
689cfef
zip
tar.gz
Notes

v0.42.1

chore: mark v0.42.1 in CHANGELOG (#265)

Apr 24, 2026
639ce4a
zip
tar.gz
Notes

v0.42.0

Merge branch 'main' of https://github.com/gogpu/gg

Apr 23, 2026
5263dee
zip
tar.gz
Notes

v0.41.2

Merge branch 'main' of https://github.com/gogpu/gg

Apr 23, 2026
a2af432
zip
tar.gz
Notes

v0.41.1

Merge branch 'main' of https://github.com/gogpu/gg

Apr 23, 2026
d7d030c
zip
tar.gz
Notes

v0.41.0

Merge branch 'main' of https://github.com/gogpu/gg

Apr 23, 2026
895640a
zip
tar.gz
Notes

v0.40.1

fix: Adreno Vulkan + clip pipeline + deps (v0.40.1)

* fix: switch in fine.wgsl for Adreno (ui#67)

* fix: Vello thread model + packed blend stack for Adreno (#252)

Port Vello's enterprise thread model to fix Adreno LLVM miscompilation:
- workgroup_size(256,1,1) → workgroup_size(4,16,1) with PIXELS_PER_THREAD=4
- clip stack: array<vec4<f32>,4> (64B) → packed u32 array (16B) + blend_spill SSBO
- CPU==GPU pixel-perfect match verified (0/120000 diff)

Reduces register pressure 4x (256→64 threads), eliminates Adreno isam→ldib
deoptimization. Matches linebender/vello PR #77 + PR #150 exactly.

Fixes #252, upstream ui#67.
Ref: linebender/vello#83

* feat: clip layer support in Vello compute pipeline (VELLO-CLIP-001)

Full clip pipeline: scene encoding (SceneElement BeginClip/EndClip) →
draw_leaf (ClipInp generation) → clip_leaf (sequential stack matching,
EndClip fixup) → coarse (per-tile clipDepth/clipZeroDepth, Vello parity) →
fine (packed blend stack, already implemented).

RasterizeSceneDefPTCL() for clip-aware rendering. Existing
RasterizeScenePTCL([]PathDef) unchanged (backward compat).

6 integration tests: nested clips, alpha modulation, draw monoid fixup,
scene encoding, backward compatibility pixel match.

* feat: visual compute clip example (VELLO-EXAMPLE-001)

Standalone CLI demo: 14-element scene with BeginClip/EndClip, 8 circles,
rounded-rect clip path, stars outside clip. CPU renders via
RasterizeSceneDefPTCL, GPU renders no-clip subset for pipeline validation.
Diagnostic pixel checks confirm correct clipping behavior.

* feat: windowed animated clip demo (VELLO-EXAMPLE-001)

Animated gg+gogpu example with clip layers: rotating circles (no clip)
+ pulsing rounded-rect clip region with bouncing shapes, rotating square,
horizontal bar. Space to pause/resume. 49 FPS on Vulkan.

* refactor(gpu): migrate Queue.ReadBuffer to Buffer.Map API

Update deps: wgpu v0.24.4→v0.25.1, gpucontext v0.11.0→v0.12.0,
naga v0.17.0→v0.17.4, x/image v0.38.0→v0.39.0, x/text v0.35.0→v0.36.0.
Remove incorrect gogpu dependency (gg is independent of gogpu).
Fix LICENSE wording.

* docs: v0.40.1 CHANGELOG — Adreno fix, clip pipeline, deps update

* fix: handle Unmap() error returns (errcheck lint)

* fix: gofmt after Unmap error handling

Apr 21, 2026
3ed545f
zip
tar.gz
Notes

v0.40.0

feat: alpha mask API — per-shape, per-layer, luminance, GPU interface…

… (v0.40.0)

* fix: implement SetMask alpha masking in Fill/Stroke pipeline (gg#238, gg#236)

Phase 1 of TASK-GG-MASK-001: per-shape alpha masking.

- Add MaskCoverage field to Paint (analogous to ClipCoverage)
- Wire mask into doFill/doStroke via applyMaskToPaint
- Integrate into SoftwareRenderer: analytic filler + coverage filler paths
- Mask and clip compose multiplicatively when both active
- GPU accelerator falls back to CPU when mask is active (Phase 2)
- Fix AsMask docs: clarify works with unfilled path, add usage patterns
- 13 new tests including @Rider21 exact reproduction case

* feat: complete alpha mask API — luminance, layer masking, GPU interface (gg#238)

Phase 2: NewLuminanceMask (CSS Level 1), ApplyMask (DestinationIn),
NewMaskFromData constructor.

Phase 3: PushMaskLayer/PopLayer — isolated layer with mask applied
on pop before compositing. Matches Vello push_mask_layer() semantics.
Nested layers and SetMask+PushMaskLayer compose correctly.

Phase 4: MaskAware interface for GPU accelerators. tryGPUFill/Stroke
upload mask texture when accelerator supports it, fall back to CPU
otherwise. SDF shader implementation deferred to separate PR.

12 new tests covering luminance, apply, layer masking, nesting,
nil safety, and SetMask+PushMaskLayer composition.

* docs: v0.40.0 CHANGELOG + README for alpha mask API

Apr 8, 2026
8f6d134
zip
tar.gz
Notes

v0.39.4

chore(deps): update wgpu v0.24.4, gogpu v0.26.4 (v0.39.4)

Apr 8, 2026
b947e54
zip
tar.gz
Notes

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.43.1

v0.43.0

v0.42.1

v0.42.0

v0.41.2

v0.41.1

v0.41.0

v0.40.1

v0.40.0

v0.39.4

Tags: gogpu/gg