diff --git a/docs/superpowers/specs/2026-03-14-memory-fix-and-reader-ui-design.md b/docs/superpowers/specs/2026-03-14-memory-fix-and-reader-ui-design.md index aa6ecb2..d8bbeae 100644 --- a/docs/superpowers/specs/2026-03-14-memory-fix-and-reader-ui-design.md +++ b/docs/superpowers/specs/2026-03-14-memory-fix-and-reader-ui-design.md @@ -17,71 +17,96 @@ Two issues with the current Vorleser app: **Root cause:** Each `synthesizer.synthesize(text:)` call creates MLX tensors (model activations, attention matrices, audio output) that accumulate because: - Swift ARC doesn't eagerly release autoreleased ObjC/C++ objects inside async loops -- MLX holds a GPU memory pool that grows unless explicitly drained +- MLX holds a GPU/CPU memory pool that grows unless explicitly cleared **Changes:** -1. Wrap each `synthesizer.synthesize()` call in `AudioEngine.playbackLoop()` — including the prefetch task — in `autoreleasepool { }` to force release of ObjC-bridged temporaries. -2. After each synthesis call, invoke `MLX.GPU.drain()` (or equivalent MLXUtilsLibrary cleanup API) to release the GPU memory pool. + +1. Wrap each synchronous `synthesizer.synthesize()` call in `autoreleasepool { }`. Important: only the synchronous synthesis call goes inside the pool, not `await` expressions (which are illegal inside `autoreleasepool`). Both the main synthesis call in `playbackLoop()` and the prefetch `Task.detached` closure each get their own `autoreleasepool`. +2. After each synthesis call, invoke `MLX.Memory.clearCache()` to release the MLX memory pool. The `Synthesizer` module exposes a `clearCache()` method that wraps this call, so `AudioEngine` does not need a direct `MLX` dependency. 3. Cache `Book.sentences` — currently a computed property that re-segments the entire book on every access. Change to a stored property computed once at init. ### 2. EPUB Parser — Preserving Structure and Formatting -Currently `EPUBParser` strips all HTML to plain text and collapses whitespace. For a proper reading experience, output `NSAttributedString` instead. +Currently `EPUBParser` strips all HTML to plain text and collapses whitespace. For a proper reading experience, output `NSAttributedString` alongside the plain text. + +**Key design decision — single coordinate space:** The `attributedText.string` must be character-for-character identical to `text`. Visual formatting (paragraph spacing, heading gaps) is achieved exclusively through `NSParagraphStyle` attributes (e.g., `paragraphSpacingBefore`), NOT by inserting extra `\n` characters. This eliminates any need for offset mapping — a character offset in `text` is the same offset in `attributedText`. **Changes to BookParser module:** -- `Chapter` stores `attributedText: NSAttributedString` alongside the plain `text: String` property (derived via `.string`). -- `EPUBParser` walks the SwiftSoup DOM and builds an `NSAttributedString`: - - `

` → paragraph with spacing - - `
` → line break - - ``, `` → bold trait - - ``, `` → italic trait - - `

`–`
` → bold + larger font size - - Everything else → body font -- Use dynamic type (`UIFont.preferredFont` / `NSFont.preferredFont`) as the base, respecting system font size settings. -- `PlainTextParser` similarly produces `NSAttributedString` with paragraph breaks on `\n\n`. -**Offset compatibility:** `SentenceSegmenter` continues to operate on plain `String`. Character offsets remain valid because `NSAttributedString.string` matches the plain text used for segmentation. +- `Chapter` stores both: + - `text: String` — whitespace-normalized plain text (same as today, used for TTS and sentence segmentation) + - `attributedText: NSAttributedString` — formatted version of the same text with style attributes +- Both have identical `.string` content. The `attributedText` is built by taking the already-normalized `text` and applying `NSAttributedString` attributes based on the source HTML structure: + - Ranges corresponding to `

` content → body font + `NSParagraphStyle` with `paragraphSpacingBefore` + - Ranges corresponding to ``, `` → bold font trait + - Ranges corresponding to ``, `` → italic font trait + - Ranges corresponding to `

`–`
` → bold + larger font size + paragraph style + - All other text → body font +- `Chapter` becomes `@unchecked Sendable`. Safety justification: `String` is value-typed; `NSAttributedString` (not `NSMutableAttributedString`) is immutable. The parser must construct via `NSMutableAttributedString` and then copy to `NSAttributedString` via `NSAttributedString(attributedString:)` before storing. + +**Fonts and platform coupling:** +- iOS: use `UIFont.preferredFont(forTextStyle:)` for dynamic type support. +- macOS: `NSFont.preferredFont(forTextStyle:)` exists since macOS 13 but returns fixed-size fonts (no dynamic type on macOS). Accept this behavior for consistency. +- `NSAttributedString` with font attributes requires UIKit or AppKit. The `BookParser` module will gain `import UIKit` / `import AppKit` via `#if canImport(UIKit)` / `#if canImport(AppKit)`. This is an intentional trade-off: it couples `BookParser` to Apple platforms, but Vorleser is Apple-platform-only (iOS + macOS) and will not target Linux or other non-Apple platforms. The alternative — constructing attributed strings in the view layer — would push too much formatting logic out of the parser and duplicate it across platforms. + +**PlainTextParser restructuring:** Currently splits on `\n\n` and creates a separate `Chapter` per paragraph. This is wrong for proper book display — hundreds of tiny chapters with chapter-break spacing between each paragraph. Fix: treat the entire file as a single chapter. If the text contains clear structural markers (e.g., lines matching "Chapter N" or similar), split on those. Otherwise, one chapter. ### 3. Reading UI — BookTextView (per platform) Replace `ReadingTextView` (iOS) and `MacReadingTextView` (macOS) with a single `BookTextView` per platform. **Responsibilities:** -- Display the **full book** as one `NSAttributedString` (all chapters concatenated with chapter break spacing). +- Display the **full book** as one `NSAttributedString` (all chapters' `attributedText` concatenated end-to-end — NO separator characters inserted; chapter-break spacing is achieved solely via `NSParagraphStyle.paragraphSpacingBefore` on the first paragraph of each subsequent chapter). - Sentence highlighting via temporary text attributes (yellow background on active sentence range). -- Tap/click → character offset callback for tap-to-play. +- Tap/click → character offset callback for tap-to-play. Since offsets are 1:1 between `text` and `attributedText`, the tapped character index maps directly to the plain-text offset used by `AudioEngine`. - Programmatic scrolling to a character offset (chapter jumps, auto-follow during playback). +**Performance:** For large books (500K+ characters), building one attributed string is a one-time cost at book load. TextKit handles large strings efficiently via layout-on-demand (only visible text is laid out). The full attributed string is built on a background thread during `loadBook()` with a loading indicator shown until ready. + **Two reading modes:** 1. **Scroll mode:** The text view uses standard scrolling (`UITextView` / `NSScrollView`). The full book is one tall scrollable column. -2. **Book (paged) mode:** TextKit 2 pagination — the `NSTextLayoutManager` lays text into page-sized `NSTextContainer`s. Navigation via `UIPageViewController` (iOS) or horizontal swipe/arrow keys (macOS). Pages reflow dynamically on rotation / window resize. +2. **Book (paged) mode:** Uses a single text view with viewport-based pagination — one `NSTextLayoutManager`, one `NSTextContainer`, one `NSTextContentStorage`. Instead of creating separate text views per page, pagination works by controlling which portion of the text is visible: + + **Page-break computation:** A layout pass enumerates `NSTextLayoutFragment`s from the start of the text, accumulating their heights. When accumulated height exceeds the page height, a page break is recorded at that fragment's text range. This produces an array of `Range` page boundaries. The layout pass runs once at load and is recomputed when the container size changes (rotation / window resize). + + **Navigation:** + - **iOS:** `PagedBookView` is a `UIViewControllerRepresentable` wrapping a `UIPageViewController`. The data source provides lightweight page views on demand — each page view is a non-scrollable `UITextView` that displays a slice of the full attributed string (using `attributedSubstring(from:)` for the page's text range). Pages are recycled by `UIPageViewController` automatically. Only 3 pages exist at any time (current + neighbors). + - **macOS:** `PagedBookView` is an `NSViewRepresentable` hosting an `NSTextView`. Arrow keys and swipe gestures (via `NSPanGestureRecognizer`) change the current page index, updating the displayed text range. + + **Highlighting across pages:** The highlight is applied to the full attributed string before slicing into pages, so whatever page is currently displayed shows the correct highlight if it falls within that page's range. + + Pages reflow dynamically on rotation (iOS) / window resize (macOS) — the page boundary array is recomputed and the current page adjusted to keep the same text position visible. + +**Relationship between BookTextView and PagedBookView:** These are sibling views, not nested. The parent `ReaderView` / `MacReaderView` conditionally shows one or the other based on reading mode. `BookTextView` handles scroll mode. `PagedBookView` handles paged mode. Both receive the same `NSAttributedString` and highlight range from `ReaderViewModel`. **Chapter navigation:** -- Each `Chapter` knows its character offset within the full book text. -- "Jump to chapter" computes the offset → scroll mode scrolls to it; paged mode finds which page contains that offset and navigates there. +- `ReaderViewModel` maintains a chapter offset table mapping each chapter index to its character offset within the full book text. +- "Jump to chapter" → scroll mode scrolls to that offset; paged mode finds which page contains that offset and navigates there. **Auto-follow during playback:** -- Observe `engine.currentPosition` → compute highlighted sentence range → apply highlight attribute → scroll/page to keep the active sentence visible. +- Observe `engine.currentPosition` → look up the active sentence range (1:1 offset, no mapping needed) → apply highlight attribute → scroll/page to keep the active sentence visible. ### 4. ReaderViewModel — Shared Logic -Extract duplicated logic from `ReaderView` (iOS) and `MacReaderView` (macOS) into a shared `ReaderViewModel` (`@Observable` class). +Extract duplicated logic from `ReaderView` (iOS) and `MacReaderView` (macOS) into a shared `ReaderViewModel` (`@Observable` class) in VorleserKit. **Owns:** - Book loading and parsing - Synthesizer initialization - Reference to `AudioEngine` -- Current chapter index, reading mode (scroll / paged), full attributed string -- Chapter offset table for jump navigation -- Highlight range computation +- Current chapter index, reading mode (scroll / paged) +- Chapter offset table: maps chapter index → character offset in the full text +- Highlight range computation (trivial since offsets are 1:1) + +**Does NOT own:** The full book `NSAttributedString`. This is constructed by the platform-specific view layer because chapter concatenation with spacing requires `NSParagraphStyle` construction, which can differ subtly between platforms. `ReaderViewModel` provides the `[Chapter]` array and chapter offset table; the view layer concatenates `attributedText` from each chapter. **Platform views become thin shells:** +- Build the full `NSAttributedString` from `viewModel.book.chapters` - Toolbar: chapter picker, mode toggle (scroll / book) -- `BookTextView` (platform-specific wrapper) +- Conditionally show `BookTextView` (scroll) or `PagedBookView` (paged) - `PlaybackControls` -- Wire to the view model ### 5. Playback Controls and Mode Switching @@ -90,32 +115,49 @@ Extract duplicated logic from `ReaderView` (iOS) and `MacReaderView` (macOS) int **Mode toggle:** - Segmented control or toolbar button: scroll ↔ book mode. - Switching modes preserves reading position (same character offset stays visible). -- Mode preference persisted on `StoredBook`. +- Mode preference persisted on `StoredBook` (new optional `readingMode: String` property with default `"scroll"` — lightweight SwiftData schema addition, no migration needed). -**Position persistence:** Existing `lastPosition` (character offset) continues to work — offsets are display-mode-independent. +**Position persistence:** Existing `lastPosition` (character offset) continues to work — offsets are identical across plain text, attributed text, and display modes. ## Files Changed **VorleserKit package:** -- `AudioEngine.swift` — autoreleasepool, GPU drain -- `Book.swift` — cache sentences as stored property -- `Chapter.swift` — add `attributedText: NSAttributedString` -- `EPUBParser.swift` — build NSAttributedString from HTML DOM -- `PlainTextParser.swift` — build NSAttributedString from plain text -- New: `ReaderViewModel.swift` — shared reader logic +- `Package.swift` — no new external dependencies (MLX cache clear exposed through Synthesizer) +- `Synthesizer.swift` — add `clearCache()` method wrapping `MLX.Memory.clearCache()` +- `AudioEngine.swift` — autoreleasepool around synthesis calls, call `synthesizer.clearCache()` after each +- `Book.swift` — cache sentences as stored property computed at init +- `Chapter.swift` — add `attributedText: NSAttributedString`, mark `@unchecked Sendable` +- `EPUBParser.swift` — build NSAttributedString from HTML DOM (same `.string` as plain text) +- `PlainTextParser.swift` — restructure to single chapter, build NSAttributedString +- `BookParser.swift` — pass through attributed text from parsers +- `StoredBook.swift` — add optional `readingMode` property +- New: `ReaderViewModel.swift` — shared reader logic (no platform-specific font code) **iOS app (`Vorleser-iOS/`):** -- New: `BookTextView.swift` — UIViewRepresentable wrapping UITextView -- New: paged mode view with UIPageViewController + TextKit 2 -- `ReaderView.swift` — rewrite as thin shell over ReaderViewModel +- New: `BookTextView.swift` — `UIViewRepresentable` wrapping `UITextView` (scroll mode) +- New: `PagedBookView.swift` — `UIViewControllerRepresentable` wrapping `UIPageViewController` (paged mode) +- `ReaderView.swift` — rewrite as thin shell, conditionally shows BookTextView or PagedBookView - Remove: `ReadingTextView.swift` **macOS app (`Vorleser-macOS/`):** -- New: `BookTextView.swift` — NSViewRepresentable wrapping NSTextView -- New: paged mode view with TextKit 2 pagination + swipe/arrows -- `MacReaderView.swift` — rewrite as thin shell over ReaderViewModel +- New: `BookTextView.swift` — `NSViewRepresentable` wrapping `NSTextView` (scroll mode) +- New: `PagedBookView.swift` — `NSViewRepresentable` with pagination (paged mode) +- `MacReaderView.swift` — rewrite as thin shell, conditionally shows BookTextView or PagedBookView - Remove: `MacReadingTextView.swift` +**XcodeGen:** +- `project.yml` — no changes needed (new files are in existing source directories, XcodeGen picks them up automatically) + +## Testing + +- **Memory fix:** Profile with Instruments (Allocations) on iOS — verify memory stays flat during extended playback. +- **Offset identity:** Unit test that `chapter.attributedText.string == chapter.text` for both EPUB and plain text parsing. +- **Chapter offset table:** Unit test that chapter offsets correctly index into the full book text. +- **Pagination:** UI test that page count changes on rotation / window resize. +- **Mode switching:** Verify reading position is preserved when toggling scroll ↔ paged. +- **PlainTextParser:** Test that a plain text file produces a single chapter (or correctly detects chapter markers). +- **SwiftData migration:** Test on a device/simulator with existing stored books to verify the new optional `readingMode` property is added without data loss. + ## Not In Scope - Voice selection UI