Tags: gogpu/gg
Tags
v0.43.1: blit-only fix, type-safe GPU handles (ADR-018), overlay fix (#… …268) * chore(deps): update wgpu v0.26.2 → v0.26.3 * feat(gpu): single command buffer compositor (ADR-017, Flutter pattern) Multiple render sessions can record render passes into one shared command encoder via SetSharedEncoder(). One Finish + Submit per frame — zero Vulkan semaphore conflicts by design. - SetSharedEncoder() on Context (public, any type for duck typing) - SetSharedEncoder() on GPURenderContext (any → *wgpu.CommandEncoder) - encodeToEncoder() in render session (surface pass, no submit) - encodeBlitToEncoder() in render session (blit pass, no submit) - RenderFrameGrouped accepts optional sharedEncoder parameter - Backward compatible: nil encoder = existing per-context submit path ADR-017. Enterprise ref: Flutter Impeller single command buffer per frame. * feat(gpu): CreateSharedEncoder + SubmitSharedEncoder — complete lifecycle API Complete single-command-buffer lifecycle for ui integration: CreateSharedEncoder() → create encoder SetSharedEncoder() → set on each context FlushGPUWithView() → records render pass (no submit) SubmitSharedEncoder() → finish + submit once Duck-typed public API (any) to avoid wgpu import at root level. Internal type assertion to *wgpu.CommandEncoder in GPURenderContext. * docs: mark v0.43.1 in CHANGELOG + ROADMAP * fix(gpu): nil-guard session in CreateEncoder/SubmitEncoder Session may be nil if GPU is not yet initialized. Prevents nil pointer dereference when CreateSharedEncoder is called before first GPU operation. * chore(deps): update wgpu v0.26.3 → v0.26.4 (VK-006 PRESENT_SRC_KHR fix) wgpu v0.26.4 adds automatic PRESENT_SRC_KHR layout transition before vkQueuePresentKHR — fixes flickering with blit-only and shared encoder paths where no render pass transitions the swapchain image. * chore: fix formatting + update CHANGELOG deps to wgpu v0.26.4 * fix(gpu): blit-only black screen + resource leak + examples update Blit-only compositor path (ADR-016) produced black screen because RenderFrameGrouped early-returned on totalItems==0 without checking baseLayer — silently skipping the entire blit render pass for frames with only a base layer texture and zero vector shapes. Also fixed GPU texture resource leak: buildGPUTextureResources allocated new vertex/uniform buffers every frame without releasing previous ones. Now uses session-level persistent buffers with grow-only reallocation (same pattern as SDF/convex/image/text tiers). - fix: totalItems==0 guard now checks baseLayer (render_session.go:523) - fix: session-level gpuTexVertBuf/gpuTexUniformBufs/gpuTexBindGroups - add: examples/blit_only/ — standalone non-MSAA compositor demo - deps: all examples updated to gogpu v0.29.2 + wgpu v0.26.4 - docs: CHANGELOG, README (compositor examples), ROADMAP v0.43.1 * feat(gpu): type-safe GPU handles — zero any in pipeline API (ADR-018) Replace any with gpucontext.TextureView and gpucontext.CommandEncoder opaque handles (struct{ptr unsafe.Pointer}) across the entire GPU pipeline public API. Compile-time type safety: FlushGPUWithView, SetSharedEncoder, CreateSharedEncoder, SubmitSharedEncoder, RenderTarget.SurfaceView, RenderDirect — all typed. TextureView cannot be confused with CommandEncoder or other resource types. 8 bytes value type, zero allocations, GC-safe. Nil checks via .IsNil(). Follows Vulkan/Ebitengine/Go Protobuf Opaque pattern. Breaking: view any -> gpucontext.TextureView, encoder any -> gpucontext.CommandEncoder. Requires gpucontext v0.15.0. * chore: add nolint:gosec for ADR-018 unsafe.Pointer conversions * chore(deps): update gpucontext v0.15.0 + examples gogpu v0.29.3 Cascade release: gpucontext v0.15.0 (type-safe handles) and gogpu v0.29.3 (typed SurfaceView) are now published. Update all go.mod references. * test(gpu): enterprise GPU texture overlay tests (13 tests) Covers vertex data positioning, ortho projection, command queueing, PendingCount with baseLayer/overlay combinations, isBlitOnly detection, and RenderFrameGrouped guard regression (BUG-GG-BLIT-PATH-001). Verifies overlay at (100,100,48,48) produces correct vertex coords and that ortho matrix uses main viewport, not overlay texture size. * fix(gpu): overlay texture stretched to full screen (BUG-GG-GPU-TEXTURE-OVERLAY-SIZE) Root cause: buildGPUTextureResources used a single shared vertex buffer (s.gpuTexVertBuf) for both base layer and overlay textures. Base layer (full-screen quad 0,0,600,400) overwrote overlay vertices (48x48 at 100,100). Fix: separate vertex buffers — s.gpuTexVertBuf for overlays, s.gpuTexBaseVertBuf for base layer. isBaseLayer parameter selects buffer. Regression test: TestBuildGPUTextureResources_SeparateVertexBuffers verifies overlay and base layer never share a vertex buffer. * docs: update CHANGELOG + ROADMAP with overlay fix and enterprise tests
feat(gpu): zero-readback compositor pipeline (ADR-015/016, v0.43.0) (#… …266) * feat(ggcanvas): FlushPixmap() for zero-readback rendering (ADR-006) FlushPixmap() uploads CPU pixmap to GPU texture without calling FlushGPU(). Pending GPU shapes remain queued for the caller to flush directly to the surface via FlushGPUWithView(), eliminating the GPU→CPU→GPU readback. Flush() refactored to delegate: FlushGPU() + FlushPixmap(). Enables ui ADR-006 Phase 1: spinner GPU cost 10% → <5% by skipping full-pixmap MSAA render + resolve + CopyTextureToBuffer + fence wait. 4 new tests: unit, closed canvas, Flush/FlushPixmap consistency, partial upload. * feat(gpu): DrawGPUTextureBase — compositor base layer (ADR-015) Single-pass compositor: base layer textured quad drawn BEFORE all GPU tiers (SDF, convex, stencil, images, text, glyph mask) in the render pass. Enables zero-readback rendering where CPU pixmap is background and GPU shapes render on top. Flutter OffsetLayer pattern. - DrawGPUTextureBase() public API (context_image.go) - QueueBaseLayer() in GPURenderContext (last call wins) - Base layer rendering in encodeSubmitSurfaceGrouped/ReadbackGrouped - PendingCount includes base layer - 4 new base layer tests - Fixed 2 compute mode tests (wrong CanCompute assumption) * feat: BeginGPUFrame() for persistent context reuse (RepaintBoundary) Resets per-context GPU frame state (frameRendered, lastView, clipRect) so the next render pass uses LoadOpClear instead of LoadOpLoad. Required when a persistent gg.Context renders to the same view across frames — without this, stale content from the previous frame bleeds through via LoadOpLoad. Not needed for one-shot contexts or when the view changes between frames (auto-reset on view change). Verified: BeginAcceleratorFrame() only resets the default context, not per-context GPURenderContexts (ADR-013). UI agent correctly identified the gap. * feat(gpu): non-MSAA compositor fast path (ADR-016) When a frame contains only textured quads (base layer + overlays) with zero vector shapes, use a 1x render pass directly to swapchain instead of 4x MSAA render + resolve. 93% bandwidth reduction at 1080p. - isBlitOnly(): detects blit-only frames (no SDF/convex/stencil/text) - encodeBlitOnlyPass(): 1x render pass, no MSAA, no depth/stencil - ensureBlitPipeline(): SampleCount=1 pipeline variant (lazy init) - RecordBlitDraws(): draws quads using blit pipeline - 6 new tests for blit-only detection (base only, SDF, text, overlay) Enterprise pattern: Flutter Impeller, Chrome cc, Qt6 RHI all use non-MSAA compositor passes for textured quad compositing. * feat(gpu): FlushGPUWithViewDamage — scissor-clipped compositor (ADR-016 Phase 2) Damage-aware compositor: when damageRect is non-empty, uses LoadOpLoad (preserve previous frame) and scissor-clips to the dirty region. Only the damaged pixels are re-composited — 48x48 spinner updates 9KB vs 8MB full surface at 1080p. - FlushGPUWithViewDamage() on Context (public API) - DamageRect field on GPURenderTarget - encodeBlitOnlyPass respects damage rect (LoadOpLoad + scissor) - Full surface blit when damageRect is empty (existing behavior) * feat: FillRectCPU + Pixmap.FillRect — CPU-only rect fill (ADR-016) CPU-only rectangle fill that bypasses GPU SDF accelerator. Without this, dirty-region background clearing in ui routes through SDF → blocks non-MSAA blit path (isBlitOnly always false). - Pixmap.FillRect: direct pixel fill, row-copy optimized, bounds clamped - Context.FillRectCPU: device-scale aware, flushes GPU first for z-order - 6 new tests: fill correctness, clamping, out-of-bounds, genID, no SDF - Fixed UI agent's empty rect test (image.Rect auto-canonicalizes) - Cleaned leftover log import from render_session.go * fix(test): GPU texture overlays are blit-only (no MSAA needed) isBlitOnly correctly allows GPU texture overlays (RepaintBoundary cached textures) in the non-MSAA fast path — they are textured quads, same as the base layer. Test was incorrectly expecting NOT blit-only. * chore(deps): update wgpu v0.25.7 → v0.26.1 (PresentWithDamage) wgpu v0.26.1 adds damage-aware surface presentation on all backends: Software (BitBlt/XPutImage), Vulkan (VK_KHR_incremental_present), DX12 (Present1), GLES (eglSwapBuffersWithDamageKHR). gg does not use PresentWithDamage directly — it's called by gogpu. This dep update ensures go.mod reflects the version used via go.work. * feat(ggcanvas): PixmapTextureView() for single-pass zero-readback (ADR-015) Duck-typed accessor: returns gpucontext.TextureView from the pixmap GPU texture for DrawGPUTextureBase single-pass compositing. Uses Go structural typing — no gogpu import, calls Texture.TextureView() if available. Enables the complete zero-readback pipeline: FlushPixmap → PixmapTextureView → DrawGPUTextureBase → FlushGPUWithView 4 new tests: before flush, pending texture, promoted, closed canvas. Requires gogpu Texture.TextureView() (already implemented). * feat(ggcanvas): EnsureGPUTexture + PixmapTextureView — zero-readback setup EnsureGPUTexture promotes pendingTexture to real GPU texture (one-time). PixmapTextureView returns gpucontext.TextureView via duck typing. Together they enable: FlushPixmap → PixmapTextureView → DrawGPUTextureBase. 4 new tests. Performance investigation pending for standalone pipeline. * chore(deps): update wgpu v0.26.1 → v0.26.2 (Buffer/BindGroup auto-cleanup) wgpu v0.26.2 adds runtime.AddCleanup for Buffer and BindGroup — automatic deferred destruction via GC prevents per-frame resource leaks that caused stuttering in the manual zero-readback pipeline. * fix: warn on global GPU fallback in multi-context scenarios Add warnGPUFallback() — one-time slog.Warn when GPU operations fall back to global SDFAccelerator.defaultCtx instead of per-context GPURenderContext. Covers: tryGPUFill, tryGPUStroke, FlushGPU, FlushGPUWithView, FlushGPUWithViewDamage, flushGPUAccelerator, tryGPUText, tryGPUGlyphMaskText. In multi-context (RepaintBoundary), this fallback causes shape leaking. The warning makes it immediately visible in logs instead of silent corruption. * refactor: type gpuCtx as gpuContextOps instead of any Replace untyped `gpuCtx any` with `gpuCtx gpuContextOps` for compile-time type safety. The gpuContextOps interface is defined in the same package — no circular import. Type assertion moved to ensureGPUCtx (once at creation) instead of every gpuCtxOps() call. gpuCtxOps() simplified to direct return. * docs: complete CHANGELOG for v0.43.0 — all features, fixes, deps Added missing entries: EnsureGPUTexture, GPU fallback warnings, gpuCtx typing refactor, wgpu v0.26.2 dep update. * fix(gpu): separate blit pipeline layout — single bind group (no clip) Blit pipeline used pipeLayout with 2 bind groups (texture + clip), but RecordBlitDraws only sets group 0 — leaving group 1 undefined causes GPU validation errors. New blitLayout with single bind group for the non-MSAA compositor fast path. * chore(deps): update gogpu v0.28.3 → v0.29.0 in gogpu_integration example * feat: zero-readback compositor examples (ADR-015/016) Two standalone examples demonstrating compositor APIs: - zero_readback/ — standard RenderDirect path (GPU-direct, smooth) - zero_readback_manual/ — manual FlushPixmap + DrawGPUTextureBase pipeline (CPU/GPU content separation, used by ui/desktop.go) Both with separate go.mod, README, gogpu v0.29.0. * chore: mark v0.43.0 in CHANGELOG
fix: Adreno Vulkan + clip pipeline + deps (v0.40.1) * fix: switch in fine.wgsl for Adreno (ui#67) * fix: Vello thread model + packed blend stack for Adreno (#252) Port Vello's enterprise thread model to fix Adreno LLVM miscompilation: - workgroup_size(256,1,1) → workgroup_size(4,16,1) with PIXELS_PER_THREAD=4 - clip stack: array<vec4<f32>,4> (64B) → packed u32 array (16B) + blend_spill SSBO - CPU==GPU pixel-perfect match verified (0/120000 diff) Reduces register pressure 4x (256→64 threads), eliminates Adreno isam→ldib deoptimization. Matches linebender/vello PR #77 + PR #150 exactly. Fixes #252, upstream ui#67. Ref: linebender/vello#83 * feat: clip layer support in Vello compute pipeline (VELLO-CLIP-001) Full clip pipeline: scene encoding (SceneElement BeginClip/EndClip) → draw_leaf (ClipInp generation) → clip_leaf (sequential stack matching, EndClip fixup) → coarse (per-tile clipDepth/clipZeroDepth, Vello parity) → fine (packed blend stack, already implemented). RasterizeSceneDefPTCL() for clip-aware rendering. Existing RasterizeScenePTCL([]PathDef) unchanged (backward compat). 6 integration tests: nested clips, alpha modulation, draw monoid fixup, scene encoding, backward compatibility pixel match. * feat: visual compute clip example (VELLO-EXAMPLE-001) Standalone CLI demo: 14-element scene with BeginClip/EndClip, 8 circles, rounded-rect clip path, stars outside clip. CPU renders via RasterizeSceneDefPTCL, GPU renders no-clip subset for pipeline validation. Diagnostic pixel checks confirm correct clipping behavior. * feat: windowed animated clip demo (VELLO-EXAMPLE-001) Animated gg+gogpu example with clip layers: rotating circles (no clip) + pulsing rounded-rect clip region with bouncing shapes, rotating square, horizontal bar. Space to pause/resume. 49 FPS on Vulkan. * refactor(gpu): migrate Queue.ReadBuffer to Buffer.Map API Update deps: wgpu v0.24.4→v0.25.1, gpucontext v0.11.0→v0.12.0, naga v0.17.0→v0.17.4, x/image v0.38.0→v0.39.0, x/text v0.35.0→v0.36.0. Remove incorrect gogpu dependency (gg is independent of gogpu). Fix LICENSE wording. * docs: v0.40.1 CHANGELOG — Adreno fix, clip pipeline, deps update * fix: handle Unmap() error returns (errcheck lint) * fix: gofmt after Unmap error handling
feat: alpha mask API — per-shape, per-layer, luminance, GPU interface… … (v0.40.0) * fix: implement SetMask alpha masking in Fill/Stroke pipeline (gg#238, gg#236) Phase 1 of TASK-GG-MASK-001: per-shape alpha masking. - Add MaskCoverage field to Paint (analogous to ClipCoverage) - Wire mask into doFill/doStroke via applyMaskToPaint - Integrate into SoftwareRenderer: analytic filler + coverage filler paths - Mask and clip compose multiplicatively when both active - GPU accelerator falls back to CPU when mask is active (Phase 2) - Fix AsMask docs: clarify works with unfilled path, add usage patterns - 13 new tests including @Rider21 exact reproduction case * feat: complete alpha mask API — luminance, layer masking, GPU interface (gg#238) Phase 2: NewLuminanceMask (CSS Level 1), ApplyMask (DestinationIn), NewMaskFromData constructor. Phase 3: PushMaskLayer/PopLayer — isolated layer with mask applied on pop before compositing. Matches Vello push_mask_layer() semantics. Nested layers and SetMask+PushMaskLayer compose correctly. Phase 4: MaskAware interface for GPU accelerators. tryGPUFill/Stroke upload mask texture when accelerator supports it, fall back to CPU otherwise. SDF shader implementation deferred to separate PR. 12 new tests covering luminance, apply, layer masking, nesting, nil safety, and SetMask+PushMaskLayer composition. * docs: v0.40.0 CHANGELOG + README for alpha mask API
PreviousNext