Files
vorleser/docs/superpowers/specs/2026-03-14-memory-fix-and-reader-ui-design.md
2026-03-14 05:03:59 +01:00

13 KiB
Raw Permalink Blame History

Memory Fix and Reader UI Redesign

Date: 2026-03-14 Status: Approved Platforms: iOS and macOS

Problem Statement

Two issues with the current Vorleser app:

  1. Memory crash on iOS — The TTS synthesis loop accumulates MLX tensors across sentences without releasing them. iOS hits its ~1-2 GB memory ceiling and crashes after a few sentences of playback.
  2. Reading UI is basic — Text is shown one chapter at a time. Users want a proper ebook reading experience: continuous scrolling or paged (Apple Books-style) reading with chapter navigation, preserved formatting, and auto-follow during playback.

Design

1. Memory Fix

Root cause: Each synthesizer.synthesize(text:) call creates MLX tensors (model activations, attention matrices, audio output) that accumulate because:

  • Swift ARC doesn't eagerly release autoreleased ObjC/C++ objects inside async loops
  • MLX holds a GPU/CPU memory pool that grows unless explicitly cleared

Changes:

  1. Wrap each synchronous synthesizer.synthesize() call in autoreleasepool { }. Important: only the synchronous synthesis call goes inside the pool, not await expressions (which are illegal inside autoreleasepool). Both the main synthesis call in playbackLoop() and the prefetch Task.detached closure each get their own autoreleasepool.
  2. After each synthesis call, invoke MLX.Memory.clearCache() to release the MLX memory pool. The Synthesizer module exposes a clearCache() method that wraps this call, so AudioEngine does not need a direct MLX dependency.
  3. Cache Book.sentences — currently a computed property that re-segments the entire book on every access. Change to a stored property computed once at init.

2. EPUB Parser — Preserving Structure and Formatting

Currently EPUBParser strips all HTML to plain text and collapses whitespace. For a proper reading experience, output NSAttributedString alongside the plain text.

Key design decision — single coordinate space: The attributedText.string must be character-for-character identical to text. Visual formatting (paragraph spacing, heading gaps) is achieved exclusively through NSParagraphStyle attributes (e.g., paragraphSpacingBefore), NOT by inserting extra \n characters. This eliminates any need for offset mapping — a character offset in text is the same offset in attributedText.

Changes to BookParser module:

  • Chapter stores both:
    • text: String — whitespace-normalized plain text (same as today, used for TTS and sentence segmentation)
    • attributedText: NSAttributedString — formatted version of the same text with style attributes
  • Both have identical .string content. The attributedText is built by taking the already-normalized text and applying NSAttributedString attributes based on the source HTML structure:
    • Ranges corresponding to <p> content → body font + NSParagraphStyle with paragraphSpacingBefore
    • Ranges corresponding to <b>, <strong> → bold font trait
    • Ranges corresponding to <i>, <em> → italic font trait
    • Ranges corresponding to <h1><h6> → bold + larger font size + paragraph style
    • All other text → body font
  • Chapter becomes @unchecked Sendable. Safety justification: String is value-typed; NSAttributedString (not NSMutableAttributedString) is immutable. The parser must construct via NSMutableAttributedString and then copy to NSAttributedString via NSAttributedString(attributedString:) before storing.

Fonts and platform coupling:

  • iOS: use UIFont.preferredFont(forTextStyle:) for dynamic type support.
  • macOS: NSFont.preferredFont(forTextStyle:) exists since macOS 13 but returns fixed-size fonts (no dynamic type on macOS). Accept this behavior for consistency.
  • NSAttributedString with font attributes requires UIKit or AppKit. The BookParser module will gain import UIKit / import AppKit via #if canImport(UIKit) / #if canImport(AppKit). This is an intentional trade-off: it couples BookParser to Apple platforms, but Vorleser is Apple-platform-only (iOS + macOS) and will not target Linux or other non-Apple platforms. The alternative — constructing attributed strings in the view layer — would push too much formatting logic out of the parser and duplicate it across platforms.

PlainTextParser restructuring: Currently splits on \n\n and creates a separate Chapter per paragraph. This is wrong for proper book display — hundreds of tiny chapters with chapter-break spacing between each paragraph. Fix: treat the entire file as a single chapter. If the text contains clear structural markers (e.g., lines matching "Chapter N" or similar), split on those. Otherwise, one chapter.

3. Reading UI — BookTextView (per platform)

Replace ReadingTextView (iOS) and MacReadingTextView (macOS) with a single BookTextView per platform.

Responsibilities:

  • Display the full book as one NSAttributedString (all chapters' attributedText concatenated end-to-end — NO separator characters inserted; chapter-break spacing is achieved solely via NSParagraphStyle.paragraphSpacingBefore on the first paragraph of each subsequent chapter).
  • Sentence highlighting via temporary text attributes (yellow background on active sentence range).
  • Tap/click → character offset callback for tap-to-play. Since offsets are 1:1 between text and attributedText, the tapped character index maps directly to the plain-text offset used by AudioEngine.
  • Programmatic scrolling to a character offset (chapter jumps, auto-follow during playback).

Performance: For large books (500K+ characters), building one attributed string is a one-time cost at book load. TextKit handles large strings efficiently via layout-on-demand (only visible text is laid out). The full attributed string is built on a background thread during loadBook() with a loading indicator shown until ready.

Two reading modes:

  1. Scroll mode: The text view uses standard scrolling (UITextView / NSScrollView). The full book is one tall scrollable column.

  2. Book (paged) mode: Uses a single text view with viewport-based pagination — one NSTextLayoutManager, one NSTextContainer, one NSTextContentStorage. Instead of creating separate text views per page, pagination works by controlling which portion of the text is visible:

    Page-break computation: A layout pass enumerates NSTextLayoutFragments from the start of the text, accumulating their heights. When accumulated height exceeds the page height, a page break is recorded at that fragment's text range. This produces an array of Range<String.Index> page boundaries. The layout pass runs once at load and is recomputed when the container size changes (rotation / window resize).

    Navigation:

    • iOS: PagedBookView is a UIViewControllerRepresentable wrapping a UIPageViewController. The data source provides lightweight page views on demand — each page view is a non-scrollable UITextView that displays a slice of the full attributed string (using attributedSubstring(from:) for the page's text range). Pages are recycled by UIPageViewController automatically. Only 3 pages exist at any time (current + neighbors).
    • macOS: PagedBookView is an NSViewRepresentable hosting an NSTextView. Arrow keys and swipe gestures (via NSPanGestureRecognizer) change the current page index, updating the displayed text range.

    Highlighting across pages: The highlight is applied to the full attributed string before slicing into pages, so whatever page is currently displayed shows the correct highlight if it falls within that page's range.

    Pages reflow dynamically on rotation (iOS) / window resize (macOS) — the page boundary array is recomputed and the current page adjusted to keep the same text position visible.

Relationship between BookTextView and PagedBookView: These are sibling views, not nested. The parent ReaderView / MacReaderView conditionally shows one or the other based on reading mode. BookTextView handles scroll mode. PagedBookView handles paged mode. Both receive the same NSAttributedString and highlight range from ReaderViewModel.

Chapter navigation:

  • ReaderViewModel maintains a chapter offset table mapping each chapter index to its character offset within the full book text.
  • "Jump to chapter" → scroll mode scrolls to that offset; paged mode finds which page contains that offset and navigates there.

Auto-follow during playback:

  • Observe engine.currentPosition → look up the active sentence range (1:1 offset, no mapping needed) → apply highlight attribute → scroll/page to keep the active sentence visible.

4. ReaderViewModel — Shared Logic

Extract duplicated logic from ReaderView (iOS) and MacReaderView (macOS) into a shared ReaderViewModel (@Observable class) in VorleserKit.

Owns:

  • Book loading and parsing
  • Synthesizer initialization
  • Reference to AudioEngine
  • Current chapter index, reading mode (scroll / paged)
  • Chapter offset table: maps chapter index → character offset in the full text
  • Highlight range computation (trivial since offsets are 1:1)

Does NOT own: The full book NSAttributedString. This is constructed by the platform-specific view layer because chapter concatenation with spacing requires NSParagraphStyle construction, which can differ subtly between platforms. ReaderViewModel provides the [Chapter] array and chapter offset table; the view layer concatenates attributedText from each chapter.

Platform views become thin shells:

  • Build the full NSAttributedString from viewModel.book.chapters
  • Toolbar: chapter picker, mode toggle (scroll / book)
  • Conditionally show BookTextView (scroll) or PagedBookView (paged)
  • PlaybackControls

5. Playback Controls and Mode Switching

Playback controls unchanged (play/pause, skip forward/back at sentence granularity). Auto-follow keeps the active sentence visible regardless of mode.

Mode toggle:

  • Segmented control or toolbar button: scroll ↔ book mode.
  • Switching modes preserves reading position (same character offset stays visible).
  • Mode preference persisted on StoredBook (new optional readingMode: String property with default "scroll" — lightweight SwiftData schema addition, no migration needed).

Position persistence: Existing lastPosition (character offset) continues to work — offsets are identical across plain text, attributed text, and display modes.

Files Changed

VorleserKit package:

  • Package.swift — no new external dependencies (MLX cache clear exposed through Synthesizer)
  • Synthesizer.swift — add clearCache() method wrapping MLX.Memory.clearCache()
  • AudioEngine.swift — autoreleasepool around synthesis calls, call synthesizer.clearCache() after each
  • Book.swift — cache sentences as stored property computed at init
  • Chapter.swift — add attributedText: NSAttributedString, mark @unchecked Sendable
  • EPUBParser.swift — build NSAttributedString from HTML DOM (same .string as plain text)
  • PlainTextParser.swift — restructure to single chapter, build NSAttributedString
  • BookParser.swift — pass through attributed text from parsers
  • StoredBook.swift — add optional readingMode property
  • New: ReaderViewModel.swift — shared reader logic (no platform-specific font code)

iOS app (Vorleser-iOS/):

  • New: BookTextView.swiftUIViewRepresentable wrapping UITextView (scroll mode)
  • New: PagedBookView.swiftUIViewControllerRepresentable wrapping UIPageViewController (paged mode)
  • ReaderView.swift — rewrite as thin shell, conditionally shows BookTextView or PagedBookView
  • Remove: ReadingTextView.swift

macOS app (Vorleser-macOS/):

  • New: BookTextView.swiftNSViewRepresentable wrapping NSTextView (scroll mode)
  • New: PagedBookView.swiftNSViewRepresentable with pagination (paged mode)
  • MacReaderView.swift — rewrite as thin shell, conditionally shows BookTextView or PagedBookView
  • Remove: MacReadingTextView.swift

XcodeGen:

  • project.yml — no changes needed (new files are in existing source directories, XcodeGen picks them up automatically)

Testing

  • Memory fix: Profile with Instruments (Allocations) on iOS — verify memory stays flat during extended playback.
  • Offset identity: Unit test that chapter.attributedText.string == chapter.text for both EPUB and plain text parsing.
  • Chapter offset table: Unit test that chapter offsets correctly index into the full book text.
  • Pagination: UI test that page count changes on rotation / window resize.
  • Mode switching: Verify reading position is preserved when toggling scroll ↔ paged.
  • PlainTextParser: Test that a plain text file produces a single chapter (or correctly detects chapter markers).
  • SwiftData migration: Test on a device/simulator with existing stored books to verify the new optional readingMode property is added without data loss.

Not In Scope

  • Voice selection UI
  • Bookmarks
  • Table of contents sidebar
  • Search within book
  • Night mode / theme customization