refine design spec: fix reviewer issues (UIKit coupling, pagination, offsets)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -17,71 +17,96 @@ Two issues with the current Vorleser app:
|
||||
|
||||
**Root cause:** Each `synthesizer.synthesize(text:)` call creates MLX tensors (model activations, attention matrices, audio output) that accumulate because:
|
||||
- Swift ARC doesn't eagerly release autoreleased ObjC/C++ objects inside async loops
|
||||
- MLX holds a GPU memory pool that grows unless explicitly drained
|
||||
- MLX holds a GPU/CPU memory pool that grows unless explicitly cleared
|
||||
|
||||
**Changes:**
|
||||
1. Wrap each `synthesizer.synthesize()` call in `AudioEngine.playbackLoop()` — including the prefetch task — in `autoreleasepool { }` to force release of ObjC-bridged temporaries.
|
||||
2. After each synthesis call, invoke `MLX.GPU.drain()` (or equivalent MLXUtilsLibrary cleanup API) to release the GPU memory pool.
|
||||
|
||||
1. Wrap each synchronous `synthesizer.synthesize()` call in `autoreleasepool { }`. Important: only the synchronous synthesis call goes inside the pool, not `await` expressions (which are illegal inside `autoreleasepool`). Both the main synthesis call in `playbackLoop()` and the prefetch `Task.detached` closure each get their own `autoreleasepool`.
|
||||
2. After each synthesis call, invoke `MLX.Memory.clearCache()` to release the MLX memory pool. The `Synthesizer` module exposes a `clearCache()` method that wraps this call, so `AudioEngine` does not need a direct `MLX` dependency.
|
||||
3. Cache `Book.sentences` — currently a computed property that re-segments the entire book on every access. Change to a stored property computed once at init.
|
||||
|
||||
### 2. EPUB Parser — Preserving Structure and Formatting
|
||||
|
||||
Currently `EPUBParser` strips all HTML to plain text and collapses whitespace. For a proper reading experience, output `NSAttributedString` instead.
|
||||
Currently `EPUBParser` strips all HTML to plain text and collapses whitespace. For a proper reading experience, output `NSAttributedString` alongside the plain text.
|
||||
|
||||
**Key design decision — single coordinate space:** The `attributedText.string` must be character-for-character identical to `text`. Visual formatting (paragraph spacing, heading gaps) is achieved exclusively through `NSParagraphStyle` attributes (e.g., `paragraphSpacingBefore`), NOT by inserting extra `\n` characters. This eliminates any need for offset mapping — a character offset in `text` is the same offset in `attributedText`.
|
||||
|
||||
**Changes to BookParser module:**
|
||||
- `Chapter` stores `attributedText: NSAttributedString` alongside the plain `text: String` property (derived via `.string`).
|
||||
- `EPUBParser` walks the SwiftSoup DOM and builds an `NSAttributedString`:
|
||||
- `<p>` → paragraph with spacing
|
||||
- `<br>` → line break
|
||||
- `<b>`, `<strong>` → bold trait
|
||||
- `<i>`, `<em>` → italic trait
|
||||
- `<h1>`–`<h6>` → bold + larger font size
|
||||
- Everything else → body font
|
||||
- Use dynamic type (`UIFont.preferredFont` / `NSFont.preferredFont`) as the base, respecting system font size settings.
|
||||
- `PlainTextParser` similarly produces `NSAttributedString` with paragraph breaks on `\n\n`.
|
||||
|
||||
**Offset compatibility:** `SentenceSegmenter` continues to operate on plain `String`. Character offsets remain valid because `NSAttributedString.string` matches the plain text used for segmentation.
|
||||
- `Chapter` stores both:
|
||||
- `text: String` — whitespace-normalized plain text (same as today, used for TTS and sentence segmentation)
|
||||
- `attributedText: NSAttributedString` — formatted version of the same text with style attributes
|
||||
- Both have identical `.string` content. The `attributedText` is built by taking the already-normalized `text` and applying `NSAttributedString` attributes based on the source HTML structure:
|
||||
- Ranges corresponding to `<p>` content → body font + `NSParagraphStyle` with `paragraphSpacingBefore`
|
||||
- Ranges corresponding to `<b>`, `<strong>` → bold font trait
|
||||
- Ranges corresponding to `<i>`, `<em>` → italic font trait
|
||||
- Ranges corresponding to `<h1>`–`<h6>` → bold + larger font size + paragraph style
|
||||
- All other text → body font
|
||||
- `Chapter` becomes `@unchecked Sendable`. Safety justification: `String` is value-typed; `NSAttributedString` (not `NSMutableAttributedString`) is immutable. The parser must construct via `NSMutableAttributedString` and then copy to `NSAttributedString` via `NSAttributedString(attributedString:)` before storing.
|
||||
|
||||
**Fonts and platform coupling:**
|
||||
- iOS: use `UIFont.preferredFont(forTextStyle:)` for dynamic type support.
|
||||
- macOS: `NSFont.preferredFont(forTextStyle:)` exists since macOS 13 but returns fixed-size fonts (no dynamic type on macOS). Accept this behavior for consistency.
|
||||
- `NSAttributedString` with font attributes requires UIKit or AppKit. The `BookParser` module will gain `import UIKit` / `import AppKit` via `#if canImport(UIKit)` / `#if canImport(AppKit)`. This is an intentional trade-off: it couples `BookParser` to Apple platforms, but Vorleser is Apple-platform-only (iOS + macOS) and will not target Linux or other non-Apple platforms. The alternative — constructing attributed strings in the view layer — would push too much formatting logic out of the parser and duplicate it across platforms.
|
||||
|
||||
**PlainTextParser restructuring:** Currently splits on `\n\n` and creates a separate `Chapter` per paragraph. This is wrong for proper book display — hundreds of tiny chapters with chapter-break spacing between each paragraph. Fix: treat the entire file as a single chapter. If the text contains clear structural markers (e.g., lines matching "Chapter N" or similar), split on those. Otherwise, one chapter.
|
||||
|
||||
### 3. Reading UI — BookTextView (per platform)
|
||||
|
||||
Replace `ReadingTextView` (iOS) and `MacReadingTextView` (macOS) with a single `BookTextView` per platform.
|
||||
|
||||
**Responsibilities:**
|
||||
- Display the **full book** as one `NSAttributedString` (all chapters concatenated with chapter break spacing).
|
||||
- Display the **full book** as one `NSAttributedString` (all chapters' `attributedText` concatenated end-to-end — NO separator characters inserted; chapter-break spacing is achieved solely via `NSParagraphStyle.paragraphSpacingBefore` on the first paragraph of each subsequent chapter).
|
||||
- Sentence highlighting via temporary text attributes (yellow background on active sentence range).
|
||||
- Tap/click → character offset callback for tap-to-play.
|
||||
- Tap/click → character offset callback for tap-to-play. Since offsets are 1:1 between `text` and `attributedText`, the tapped character index maps directly to the plain-text offset used by `AudioEngine`.
|
||||
- Programmatic scrolling to a character offset (chapter jumps, auto-follow during playback).
|
||||
|
||||
**Performance:** For large books (500K+ characters), building one attributed string is a one-time cost at book load. TextKit handles large strings efficiently via layout-on-demand (only visible text is laid out). The full attributed string is built on a background thread during `loadBook()` with a loading indicator shown until ready.
|
||||
|
||||
**Two reading modes:**
|
||||
|
||||
1. **Scroll mode:** The text view uses standard scrolling (`UITextView` / `NSScrollView`). The full book is one tall scrollable column.
|
||||
|
||||
2. **Book (paged) mode:** TextKit 2 pagination — the `NSTextLayoutManager` lays text into page-sized `NSTextContainer`s. Navigation via `UIPageViewController` (iOS) or horizontal swipe/arrow keys (macOS). Pages reflow dynamically on rotation / window resize.
|
||||
2. **Book (paged) mode:** Uses a single text view with viewport-based pagination — one `NSTextLayoutManager`, one `NSTextContainer`, one `NSTextContentStorage`. Instead of creating separate text views per page, pagination works by controlling which portion of the text is visible:
|
||||
|
||||
**Page-break computation:** A layout pass enumerates `NSTextLayoutFragment`s from the start of the text, accumulating their heights. When accumulated height exceeds the page height, a page break is recorded at that fragment's text range. This produces an array of `Range<String.Index>` page boundaries. The layout pass runs once at load and is recomputed when the container size changes (rotation / window resize).
|
||||
|
||||
**Navigation:**
|
||||
- **iOS:** `PagedBookView` is a `UIViewControllerRepresentable` wrapping a `UIPageViewController`. The data source provides lightweight page views on demand — each page view is a non-scrollable `UITextView` that displays a slice of the full attributed string (using `attributedSubstring(from:)` for the page's text range). Pages are recycled by `UIPageViewController` automatically. Only 3 pages exist at any time (current + neighbors).
|
||||
- **macOS:** `PagedBookView` is an `NSViewRepresentable` hosting an `NSTextView`. Arrow keys and swipe gestures (via `NSPanGestureRecognizer`) change the current page index, updating the displayed text range.
|
||||
|
||||
**Highlighting across pages:** The highlight is applied to the full attributed string before slicing into pages, so whatever page is currently displayed shows the correct highlight if it falls within that page's range.
|
||||
|
||||
Pages reflow dynamically on rotation (iOS) / window resize (macOS) — the page boundary array is recomputed and the current page adjusted to keep the same text position visible.
|
||||
|
||||
**Relationship between BookTextView and PagedBookView:** These are sibling views, not nested. The parent `ReaderView` / `MacReaderView` conditionally shows one or the other based on reading mode. `BookTextView` handles scroll mode. `PagedBookView` handles paged mode. Both receive the same `NSAttributedString` and highlight range from `ReaderViewModel`.
|
||||
|
||||
**Chapter navigation:**
|
||||
- Each `Chapter` knows its character offset within the full book text.
|
||||
- "Jump to chapter" computes the offset → scroll mode scrolls to it; paged mode finds which page contains that offset and navigates there.
|
||||
- `ReaderViewModel` maintains a chapter offset table mapping each chapter index to its character offset within the full book text.
|
||||
- "Jump to chapter" → scroll mode scrolls to that offset; paged mode finds which page contains that offset and navigates there.
|
||||
|
||||
**Auto-follow during playback:**
|
||||
- Observe `engine.currentPosition` → compute highlighted sentence range → apply highlight attribute → scroll/page to keep the active sentence visible.
|
||||
- Observe `engine.currentPosition` → look up the active sentence range (1:1 offset, no mapping needed) → apply highlight attribute → scroll/page to keep the active sentence visible.
|
||||
|
||||
### 4. ReaderViewModel — Shared Logic
|
||||
|
||||
Extract duplicated logic from `ReaderView` (iOS) and `MacReaderView` (macOS) into a shared `ReaderViewModel` (`@Observable` class).
|
||||
Extract duplicated logic from `ReaderView` (iOS) and `MacReaderView` (macOS) into a shared `ReaderViewModel` (`@Observable` class) in VorleserKit.
|
||||
|
||||
**Owns:**
|
||||
- Book loading and parsing
|
||||
- Synthesizer initialization
|
||||
- Reference to `AudioEngine`
|
||||
- Current chapter index, reading mode (scroll / paged), full attributed string
|
||||
- Chapter offset table for jump navigation
|
||||
- Highlight range computation
|
||||
- Current chapter index, reading mode (scroll / paged)
|
||||
- Chapter offset table: maps chapter index → character offset in the full text
|
||||
- Highlight range computation (trivial since offsets are 1:1)
|
||||
|
||||
**Does NOT own:** The full book `NSAttributedString`. This is constructed by the platform-specific view layer because chapter concatenation with spacing requires `NSParagraphStyle` construction, which can differ subtly between platforms. `ReaderViewModel` provides the `[Chapter]` array and chapter offset table; the view layer concatenates `attributedText` from each chapter.
|
||||
|
||||
**Platform views become thin shells:**
|
||||
- Build the full `NSAttributedString` from `viewModel.book.chapters`
|
||||
- Toolbar: chapter picker, mode toggle (scroll / book)
|
||||
- `BookTextView` (platform-specific wrapper)
|
||||
- Conditionally show `BookTextView` (scroll) or `PagedBookView` (paged)
|
||||
- `PlaybackControls`
|
||||
- Wire to the view model
|
||||
|
||||
### 5. Playback Controls and Mode Switching
|
||||
|
||||
@@ -90,32 +115,49 @@ Extract duplicated logic from `ReaderView` (iOS) and `MacReaderView` (macOS) int
|
||||
**Mode toggle:**
|
||||
- Segmented control or toolbar button: scroll ↔ book mode.
|
||||
- Switching modes preserves reading position (same character offset stays visible).
|
||||
- Mode preference persisted on `StoredBook`.
|
||||
- Mode preference persisted on `StoredBook` (new optional `readingMode: String` property with default `"scroll"` — lightweight SwiftData schema addition, no migration needed).
|
||||
|
||||
**Position persistence:** Existing `lastPosition` (character offset) continues to work — offsets are display-mode-independent.
|
||||
**Position persistence:** Existing `lastPosition` (character offset) continues to work — offsets are identical across plain text, attributed text, and display modes.
|
||||
|
||||
## Files Changed
|
||||
|
||||
**VorleserKit package:**
|
||||
- `AudioEngine.swift` — autoreleasepool, GPU drain
|
||||
- `Book.swift` — cache sentences as stored property
|
||||
- `Chapter.swift` — add `attributedText: NSAttributedString`
|
||||
- `EPUBParser.swift` — build NSAttributedString from HTML DOM
|
||||
- `PlainTextParser.swift` — build NSAttributedString from plain text
|
||||
- New: `ReaderViewModel.swift` — shared reader logic
|
||||
- `Package.swift` — no new external dependencies (MLX cache clear exposed through Synthesizer)
|
||||
- `Synthesizer.swift` — add `clearCache()` method wrapping `MLX.Memory.clearCache()`
|
||||
- `AudioEngine.swift` — autoreleasepool around synthesis calls, call `synthesizer.clearCache()` after each
|
||||
- `Book.swift` — cache sentences as stored property computed at init
|
||||
- `Chapter.swift` — add `attributedText: NSAttributedString`, mark `@unchecked Sendable`
|
||||
- `EPUBParser.swift` — build NSAttributedString from HTML DOM (same `.string` as plain text)
|
||||
- `PlainTextParser.swift` — restructure to single chapter, build NSAttributedString
|
||||
- `BookParser.swift` — pass through attributed text from parsers
|
||||
- `StoredBook.swift` — add optional `readingMode` property
|
||||
- New: `ReaderViewModel.swift` — shared reader logic (no platform-specific font code)
|
||||
|
||||
**iOS app (`Vorleser-iOS/`):**
|
||||
- New: `BookTextView.swift` — UIViewRepresentable wrapping UITextView
|
||||
- New: paged mode view with UIPageViewController + TextKit 2
|
||||
- `ReaderView.swift` — rewrite as thin shell over ReaderViewModel
|
||||
- New: `BookTextView.swift` — `UIViewRepresentable` wrapping `UITextView` (scroll mode)
|
||||
- New: `PagedBookView.swift` — `UIViewControllerRepresentable` wrapping `UIPageViewController` (paged mode)
|
||||
- `ReaderView.swift` — rewrite as thin shell, conditionally shows BookTextView or PagedBookView
|
||||
- Remove: `ReadingTextView.swift`
|
||||
|
||||
**macOS app (`Vorleser-macOS/`):**
|
||||
- New: `BookTextView.swift` — NSViewRepresentable wrapping NSTextView
|
||||
- New: paged mode view with TextKit 2 pagination + swipe/arrows
|
||||
- `MacReaderView.swift` — rewrite as thin shell over ReaderViewModel
|
||||
- New: `BookTextView.swift` — `NSViewRepresentable` wrapping `NSTextView` (scroll mode)
|
||||
- New: `PagedBookView.swift` — `NSViewRepresentable` with pagination (paged mode)
|
||||
- `MacReaderView.swift` — rewrite as thin shell, conditionally shows BookTextView or PagedBookView
|
||||
- Remove: `MacReadingTextView.swift`
|
||||
|
||||
**XcodeGen:**
|
||||
- `project.yml` — no changes needed (new files are in existing source directories, XcodeGen picks them up automatically)
|
||||
|
||||
## Testing
|
||||
|
||||
- **Memory fix:** Profile with Instruments (Allocations) on iOS — verify memory stays flat during extended playback.
|
||||
- **Offset identity:** Unit test that `chapter.attributedText.string == chapter.text` for both EPUB and plain text parsing.
|
||||
- **Chapter offset table:** Unit test that chapter offsets correctly index into the full book text.
|
||||
- **Pagination:** UI test that page count changes on rotation / window resize.
|
||||
- **Mode switching:** Verify reading position is preserved when toggling scroll ↔ paged.
|
||||
- **PlainTextParser:** Test that a plain text file produces a single chapter (or correctly detects chapter markers).
|
||||
- **SwiftData migration:** Test on a device/simulator with existing stored books to verify the new optional `readingMode` property is added without data loss.
|
||||
|
||||
## Not In Scope
|
||||
|
||||
- Voice selection UI
|
||||
|
||||
Reference in New Issue
Block a user