refine design spec: fix reviewer issues (UIKit coupling, pagination, offsets)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-03-14 05:03:59 +01:00
parent 59f4d54cfe
commit cf78e0d189

View File

@@ -17,71 +17,96 @@ Two issues with the current Vorleser app:
**Root cause:** Each `synthesizer.synthesize(text:)` call creates MLX tensors (model activations, attention matrices, audio output) that accumulate because:
- Swift ARC doesn't eagerly release autoreleased ObjC/C++ objects inside async loops
- MLX holds a GPU memory pool that grows unless explicitly drained
- MLX holds a GPU/CPU memory pool that grows unless explicitly cleared
**Changes:**
1. Wrap each `synthesizer.synthesize()` call in `AudioEngine.playbackLoop()` — including the prefetch task — in `autoreleasepool { }` to force release of ObjC-bridged temporaries.
2. After each synthesis call, invoke `MLX.GPU.drain()` (or equivalent MLXUtilsLibrary cleanup API) to release the GPU memory pool.
1. Wrap each synchronous `synthesizer.synthesize()` call in `autoreleasepool { }`. Important: only the synchronous synthesis call goes inside the pool, not `await` expressions (which are illegal inside `autoreleasepool`). Both the main synthesis call in `playbackLoop()` and the prefetch `Task.detached` closure each get their own `autoreleasepool`.
2. After each synthesis call, invoke `MLX.Memory.clearCache()` to release the MLX memory pool. The `Synthesizer` module exposes a `clearCache()` method that wraps this call, so `AudioEngine` does not need a direct `MLX` dependency.
3. Cache `Book.sentences` — currently a computed property that re-segments the entire book on every access. Change to a stored property computed once at init.
### 2. EPUB Parser — Preserving Structure and Formatting
Currently `EPUBParser` strips all HTML to plain text and collapses whitespace. For a proper reading experience, output `NSAttributedString` instead.
Currently `EPUBParser` strips all HTML to plain text and collapses whitespace. For a proper reading experience, output `NSAttributedString` alongside the plain text.
**Key design decision — single coordinate space:** The `attributedText.string` must be character-for-character identical to `text`. Visual formatting (paragraph spacing, heading gaps) is achieved exclusively through `NSParagraphStyle` attributes (e.g., `paragraphSpacingBefore`), NOT by inserting extra `\n` characters. This eliminates any need for offset mapping — a character offset in `text` is the same offset in `attributedText`.
**Changes to BookParser module:**
- `Chapter` stores `attributedText: NSAttributedString` alongside the plain `text: String` property (derived via `.string`).
- `EPUBParser` walks the SwiftSoup DOM and builds an `NSAttributedString`:
- `<p>` → paragraph with spacing
- `<br>` → line break
- `<b>`, `<strong>` → bold trait
- `<i>`, `<em>` → italic trait
- `<h1>``<h6>` → bold + larger font size
- Everything else → body font
- Use dynamic type (`UIFont.preferredFont` / `NSFont.preferredFont`) as the base, respecting system font size settings.
- `PlainTextParser` similarly produces `NSAttributedString` with paragraph breaks on `\n\n`.
**Offset compatibility:** `SentenceSegmenter` continues to operate on plain `String`. Character offsets remain valid because `NSAttributedString.string` matches the plain text used for segmentation.
- `Chapter` stores both:
- `text: String` — whitespace-normalized plain text (same as today, used for TTS and sentence segmentation)
- `attributedText: NSAttributedString` — formatted version of the same text with style attributes
- Both have identical `.string` content. The `attributedText` is built by taking the already-normalized `text` and applying `NSAttributedString` attributes based on the source HTML structure:
- Ranges corresponding to `<p>` content → body font + `NSParagraphStyle` with `paragraphSpacingBefore`
- Ranges corresponding to `<b>`, `<strong>` → bold font trait
- Ranges corresponding to `<i>`, `<em>` → italic font trait
- Ranges corresponding to `<h1>``<h6>` → bold + larger font size + paragraph style
- All other text → body font
- `Chapter` becomes `@unchecked Sendable`. Safety justification: `String` is value-typed; `NSAttributedString` (not `NSMutableAttributedString`) is immutable. The parser must construct via `NSMutableAttributedString` and then copy to `NSAttributedString` via `NSAttributedString(attributedString:)` before storing.
**Fonts and platform coupling:**
- iOS: use `UIFont.preferredFont(forTextStyle:)` for dynamic type support.
- macOS: `NSFont.preferredFont(forTextStyle:)` exists since macOS 13 but returns fixed-size fonts (no dynamic type on macOS). Accept this behavior for consistency.
- `NSAttributedString` with font attributes requires UIKit or AppKit. The `BookParser` module will gain `import UIKit` / `import AppKit` via `#if canImport(UIKit)` / `#if canImport(AppKit)`. This is an intentional trade-off: it couples `BookParser` to Apple platforms, but Vorleser is Apple-platform-only (iOS + macOS) and will not target Linux or other non-Apple platforms. The alternative — constructing attributed strings in the view layer — would push too much formatting logic out of the parser and duplicate it across platforms.
**PlainTextParser restructuring:** Currently splits on `\n\n` and creates a separate `Chapter` per paragraph. This is wrong for proper book display — hundreds of tiny chapters with chapter-break spacing between each paragraph. Fix: treat the entire file as a single chapter. If the text contains clear structural markers (e.g., lines matching "Chapter N" or similar), split on those. Otherwise, one chapter.
### 3. Reading UI — BookTextView (per platform)
Replace `ReadingTextView` (iOS) and `MacReadingTextView` (macOS) with a single `BookTextView` per platform.
**Responsibilities:**
- Display the **full book** as one `NSAttributedString` (all chapters concatenated with chapter break spacing).
- Display the **full book** as one `NSAttributedString` (all chapters' `attributedText` concatenated end-to-end — NO separator characters inserted; chapter-break spacing is achieved solely via `NSParagraphStyle.paragraphSpacingBefore` on the first paragraph of each subsequent chapter).
- Sentence highlighting via temporary text attributes (yellow background on active sentence range).
- Tap/click → character offset callback for tap-to-play.
- Tap/click → character offset callback for tap-to-play. Since offsets are 1:1 between `text` and `attributedText`, the tapped character index maps directly to the plain-text offset used by `AudioEngine`.
- Programmatic scrolling to a character offset (chapter jumps, auto-follow during playback).
**Performance:** For large books (500K+ characters), building one attributed string is a one-time cost at book load. TextKit handles large strings efficiently via layout-on-demand (only visible text is laid out). The full attributed string is built on a background thread during `loadBook()` with a loading indicator shown until ready.
**Two reading modes:**
1. **Scroll mode:** The text view uses standard scrolling (`UITextView` / `NSScrollView`). The full book is one tall scrollable column.
2. **Book (paged) mode:** TextKit 2 pagination — the `NSTextLayoutManager` lays text into page-sized `NSTextContainer`s. Navigation via `UIPageViewController` (iOS) or horizontal swipe/arrow keys (macOS). Pages reflow dynamically on rotation / window resize.
2. **Book (paged) mode:** Uses a single text view with viewport-based pagination — one `NSTextLayoutManager`, one `NSTextContainer`, one `NSTextContentStorage`. Instead of creating separate text views per page, pagination works by controlling which portion of the text is visible:
**Page-break computation:** A layout pass enumerates `NSTextLayoutFragment`s from the start of the text, accumulating their heights. When accumulated height exceeds the page height, a page break is recorded at that fragment's text range. This produces an array of `Range<String.Index>` page boundaries. The layout pass runs once at load and is recomputed when the container size changes (rotation / window resize).
**Navigation:**
- **iOS:** `PagedBookView` is a `UIViewControllerRepresentable` wrapping a `UIPageViewController`. The data source provides lightweight page views on demand — each page view is a non-scrollable `UITextView` that displays a slice of the full attributed string (using `attributedSubstring(from:)` for the page's text range). Pages are recycled by `UIPageViewController` automatically. Only 3 pages exist at any time (current + neighbors).
- **macOS:** `PagedBookView` is an `NSViewRepresentable` hosting an `NSTextView`. Arrow keys and swipe gestures (via `NSPanGestureRecognizer`) change the current page index, updating the displayed text range.
**Highlighting across pages:** The highlight is applied to the full attributed string before slicing into pages, so whatever page is currently displayed shows the correct highlight if it falls within that page's range.
Pages reflow dynamically on rotation (iOS) / window resize (macOS) — the page boundary array is recomputed and the current page adjusted to keep the same text position visible.
**Relationship between BookTextView and PagedBookView:** These are sibling views, not nested. The parent `ReaderView` / `MacReaderView` conditionally shows one or the other based on reading mode. `BookTextView` handles scroll mode. `PagedBookView` handles paged mode. Both receive the same `NSAttributedString` and highlight range from `ReaderViewModel`.
**Chapter navigation:**
- Each `Chapter` knows its character offset within the full book text.
- "Jump to chapter" computes the offset → scroll mode scrolls to it; paged mode finds which page contains that offset and navigates there.
- `ReaderViewModel` maintains a chapter offset table mapping each chapter index to its character offset within the full book text.
- "Jump to chapter" → scroll mode scrolls to that offset; paged mode finds which page contains that offset and navigates there.
**Auto-follow during playback:**
- Observe `engine.currentPosition`compute highlighted sentence range → apply highlight attribute → scroll/page to keep the active sentence visible.
- Observe `engine.currentPosition`look up the active sentence range (1:1 offset, no mapping needed) → apply highlight attribute → scroll/page to keep the active sentence visible.
### 4. ReaderViewModel — Shared Logic
Extract duplicated logic from `ReaderView` (iOS) and `MacReaderView` (macOS) into a shared `ReaderViewModel` (`@Observable` class).
Extract duplicated logic from `ReaderView` (iOS) and `MacReaderView` (macOS) into a shared `ReaderViewModel` (`@Observable` class) in VorleserKit.
**Owns:**
- Book loading and parsing
- Synthesizer initialization
- Reference to `AudioEngine`
- Current chapter index, reading mode (scroll / paged), full attributed string
- Chapter offset table for jump navigation
- Highlight range computation
- Current chapter index, reading mode (scroll / paged)
- Chapter offset table: maps chapter index → character offset in the full text
- Highlight range computation (trivial since offsets are 1:1)
**Does NOT own:** The full book `NSAttributedString`. This is constructed by the platform-specific view layer because chapter concatenation with spacing requires `NSParagraphStyle` construction, which can differ subtly between platforms. `ReaderViewModel` provides the `[Chapter]` array and chapter offset table; the view layer concatenates `attributedText` from each chapter.
**Platform views become thin shells:**
- Build the full `NSAttributedString` from `viewModel.book.chapters`
- Toolbar: chapter picker, mode toggle (scroll / book)
- `BookTextView` (platform-specific wrapper)
- Conditionally show `BookTextView` (scroll) or `PagedBookView` (paged)
- `PlaybackControls`
- Wire to the view model
### 5. Playback Controls and Mode Switching
@@ -90,32 +115,49 @@ Extract duplicated logic from `ReaderView` (iOS) and `MacReaderView` (macOS) int
**Mode toggle:**
- Segmented control or toolbar button: scroll ↔ book mode.
- Switching modes preserves reading position (same character offset stays visible).
- Mode preference persisted on `StoredBook`.
- Mode preference persisted on `StoredBook` (new optional `readingMode: String` property with default `"scroll"` — lightweight SwiftData schema addition, no migration needed).
**Position persistence:** Existing `lastPosition` (character offset) continues to work — offsets are display-mode-independent.
**Position persistence:** Existing `lastPosition` (character offset) continues to work — offsets are identical across plain text, attributed text, and display modes.
## Files Changed
**VorleserKit package:**
- `AudioEngine.swift`autoreleasepool, GPU drain
- `Book.swift` — cache sentences as stored property
- `Chapter.swift` — add `attributedText: NSAttributedString`
- `EPUBParser.swift`build NSAttributedString from HTML DOM
- `PlainTextParser.swift` — build NSAttributedString from plain text
- New: `ReaderViewModel.swift` — shared reader logic
- `Package.swift`no new external dependencies (MLX cache clear exposed through Synthesizer)
- `Synthesizer.swift` — add `clearCache()` method wrapping `MLX.Memory.clearCache()`
- `AudioEngine.swift` — autoreleasepool around synthesis calls, call `synthesizer.clearCache()` after each
- `Book.swift`cache sentences as stored property computed at init
- `Chapter.swift` — add `attributedText: NSAttributedString`, mark `@unchecked Sendable`
- `EPUBParser.swift` — build NSAttributedString from HTML DOM (same `.string` as plain text)
- `PlainTextParser.swift` — restructure to single chapter, build NSAttributedString
- `BookParser.swift` — pass through attributed text from parsers
- `StoredBook.swift` — add optional `readingMode` property
- New: `ReaderViewModel.swift` — shared reader logic (no platform-specific font code)
**iOS app (`Vorleser-iOS/`):**
- New: `BookTextView.swift` — UIViewRepresentable wrapping UITextView
- New: paged mode view with UIPageViewController + TextKit 2
- `ReaderView.swift` — rewrite as thin shell over ReaderViewModel
- New: `BookTextView.swift``UIViewRepresentable` wrapping `UITextView` (scroll mode)
- New: `PagedBookView.swift``UIViewControllerRepresentable` wrapping `UIPageViewController` (paged mode)
- `ReaderView.swift` — rewrite as thin shell, conditionally shows BookTextView or PagedBookView
- Remove: `ReadingTextView.swift`
**macOS app (`Vorleser-macOS/`):**
- New: `BookTextView.swift` — NSViewRepresentable wrapping NSTextView
- New: paged mode view with TextKit 2 pagination + swipe/arrows
- `MacReaderView.swift` — rewrite as thin shell over ReaderViewModel
- New: `BookTextView.swift``NSViewRepresentable` wrapping `NSTextView` (scroll mode)
- New: `PagedBookView.swift``NSViewRepresentable` with pagination (paged mode)
- `MacReaderView.swift` — rewrite as thin shell, conditionally shows BookTextView or PagedBookView
- Remove: `MacReadingTextView.swift`
**XcodeGen:**
- `project.yml` — no changes needed (new files are in existing source directories, XcodeGen picks them up automatically)
## Testing
- **Memory fix:** Profile with Instruments (Allocations) on iOS — verify memory stays flat during extended playback.
- **Offset identity:** Unit test that `chapter.attributedText.string == chapter.text` for both EPUB and plain text parsing.
- **Chapter offset table:** Unit test that chapter offsets correctly index into the full book text.
- **Pagination:** UI test that page count changes on rotation / window resize.
- **Mode switching:** Verify reading position is preserved when toggling scroll ↔ paged.
- **PlainTextParser:** Test that a plain text file produces a single chapter (or correctly detects chapter markers).
- **SwiftData migration:** Test on a device/simulator with existing stored books to verify the new optional `readingMode` property is added without data loss.
## Not In Scope
- Voice selection UI