diff --git a/docs/superpowers/specs/2026-03-14-memory-fix-and-reader-ui-design.md b/docs/superpowers/specs/2026-03-14-memory-fix-and-reader-ui-design.md index aa6ecb2..d8bbeae 100644 --- a/docs/superpowers/specs/2026-03-14-memory-fix-and-reader-ui-design.md +++ b/docs/superpowers/specs/2026-03-14-memory-fix-and-reader-ui-design.md @@ -17,71 +17,96 @@ Two issues with the current Vorleser app: **Root cause:** Each `synthesizer.synthesize(text:)` call creates MLX tensors (model activations, attention matrices, audio output) that accumulate because: - Swift ARC doesn't eagerly release autoreleased ObjC/C++ objects inside async loops -- MLX holds a GPU memory pool that grows unless explicitly drained +- MLX holds a GPU/CPU memory pool that grows unless explicitly cleared **Changes:** -1. Wrap each `synthesizer.synthesize()` call in `AudioEngine.playbackLoop()` — including the prefetch task — in `autoreleasepool { }` to force release of ObjC-bridged temporaries. -2. After each synthesis call, invoke `MLX.GPU.drain()` (or equivalent MLXUtilsLibrary cleanup API) to release the GPU memory pool. + +1. Wrap each synchronous `synthesizer.synthesize()` call in `autoreleasepool { }`. Important: only the synchronous synthesis call goes inside the pool, not `await` expressions (which are illegal inside `autoreleasepool`). Both the main synthesis call in `playbackLoop()` and the prefetch `Task.detached` closure each get their own `autoreleasepool`. +2. After each synthesis call, invoke `MLX.Memory.clearCache()` to release the MLX memory pool. The `Synthesizer` module exposes a `clearCache()` method that wraps this call, so `AudioEngine` does not need a direct `MLX` dependency. 3. Cache `Book.sentences` — currently a computed property that re-segments the entire book on every access. Change to a stored property computed once at init. ### 2. EPUB Parser — Preserving Structure and Formatting -Currently `EPUBParser` strips all HTML to plain text and collapses whitespace. For a proper reading experience, output `NSAttributedString` instead. +Currently `EPUBParser` strips all HTML to plain text and collapses whitespace. For a proper reading experience, output `NSAttributedString` alongside the plain text. + +**Key design decision — single coordinate space:** The `attributedText.string` must be character-for-character identical to `text`. Visual formatting (paragraph spacing, heading gaps) is achieved exclusively through `NSParagraphStyle` attributes (e.g., `paragraphSpacingBefore`), NOT by inserting extra `\n` characters. This eliminates any need for offset mapping — a character offset in `text` is the same offset in `attributedText`. **Changes to BookParser module:** -- `Chapter` stores `attributedText: NSAttributedString` alongside the plain `text: String` property (derived via `.string`). -- `EPUBParser` walks the SwiftSoup DOM and builds an `NSAttributedString`: - - `
` → paragraph with spacing
- - ` ` content → body font + `NSParagraphStyle` with `paragraphSpacingBefore`
+ - Ranges corresponding to ``, `` → bold font trait
+ - Ranges corresponding to ``, `` → italic font trait
+ - Ranges corresponding to `
` → line break
- - ``, `` → bold trait
- - ``, `` → italic trait
- - ``–`
` → bold + larger font size
- - Everything else → body font
-- Use dynamic type (`UIFont.preferredFont` / `NSFont.preferredFont`) as the base, respecting system font size settings.
-- `PlainTextParser` similarly produces `NSAttributedString` with paragraph breaks on `\n\n`.
-**Offset compatibility:** `SentenceSegmenter` continues to operate on plain `String`. Character offsets remain valid because `NSAttributedString.string` matches the plain text used for segmentation.
+- `Chapter` stores both:
+ - `text: String` — whitespace-normalized plain text (same as today, used for TTS and sentence segmentation)
+ - `attributedText: NSAttributedString` — formatted version of the same text with style attributes
+- Both have identical `.string` content. The `attributedText` is built by taking the already-normalized `text` and applying `NSAttributedString` attributes based on the source HTML structure:
+ - Ranges corresponding to `
`–`
` → bold + larger font size + paragraph style
+ - All other text → body font
+- `Chapter` becomes `@unchecked Sendable`. Safety justification: `String` is value-typed; `NSAttributedString` (not `NSMutableAttributedString`) is immutable. The parser must construct via `NSMutableAttributedString` and then copy to `NSAttributedString` via `NSAttributedString(attributedString:)` before storing.
+
+**Fonts and platform coupling:**
+- iOS: use `UIFont.preferredFont(forTextStyle:)` for dynamic type support.
+- macOS: `NSFont.preferredFont(forTextStyle:)` exists since macOS 13 but returns fixed-size fonts (no dynamic type on macOS). Accept this behavior for consistency.
+- `NSAttributedString` with font attributes requires UIKit or AppKit. The `BookParser` module will gain `import UIKit` / `import AppKit` via `#if canImport(UIKit)` / `#if canImport(AppKit)`. This is an intentional trade-off: it couples `BookParser` to Apple platforms, but Vorleser is Apple-platform-only (iOS + macOS) and will not target Linux or other non-Apple platforms. The alternative — constructing attributed strings in the view layer — would push too much formatting logic out of the parser and duplicate it across platforms.
+
+**PlainTextParser restructuring:** Currently splits on `\n\n` and creates a separate `Chapter` per paragraph. This is wrong for proper book display — hundreds of tiny chapters with chapter-break spacing between each paragraph. Fix: treat the entire file as a single chapter. If the text contains clear structural markers (e.g., lines matching "Chapter N" or similar), split on those. Otherwise, one chapter.
### 3. Reading UI — BookTextView (per platform)
Replace `ReadingTextView` (iOS) and `MacReadingTextView` (macOS) with a single `BookTextView` per platform.
**Responsibilities:**
-- Display the **full book** as one `NSAttributedString` (all chapters concatenated with chapter break spacing).
+- Display the **full book** as one `NSAttributedString` (all chapters' `attributedText` concatenated end-to-end — NO separator characters inserted; chapter-break spacing is achieved solely via `NSParagraphStyle.paragraphSpacingBefore` on the first paragraph of each subsequent chapter).
- Sentence highlighting via temporary text attributes (yellow background on active sentence range).
-- Tap/click → character offset callback for tap-to-play.
+- Tap/click → character offset callback for tap-to-play. Since offsets are 1:1 between `text` and `attributedText`, the tapped character index maps directly to the plain-text offset used by `AudioEngine`.
- Programmatic scrolling to a character offset (chapter jumps, auto-follow during playback).
+**Performance:** For large books (500K+ characters), building one attributed string is a one-time cost at book load. TextKit handles large strings efficiently via layout-on-demand (only visible text is laid out). The full attributed string is built on a background thread during `loadBook()` with a loading indicator shown until ready.
+
**Two reading modes:**
1. **Scroll mode:** The text view uses standard scrolling (`UITextView` / `NSScrollView`). The full book is one tall scrollable column.
-2. **Book (paged) mode:** TextKit 2 pagination — the `NSTextLayoutManager` lays text into page-sized `NSTextContainer`s. Navigation via `UIPageViewController` (iOS) or horizontal swipe/arrow keys (macOS). Pages reflow dynamically on rotation / window resize.
+2. **Book (paged) mode:** Uses a single text view with viewport-based pagination — one `NSTextLayoutManager`, one `NSTextContainer`, one `NSTextContentStorage`. Instead of creating separate text views per page, pagination works by controlling which portion of the text is visible:
+
+ **Page-break computation:** A layout pass enumerates `NSTextLayoutFragment`s from the start of the text, accumulating their heights. When accumulated height exceeds the page height, a page break is recorded at that fragment's text range. This produces an array of `Range