ffmpeg now writes -metadata:s:a:i language=<iso3> on every kept audio track so
files end up with canonical 3-letter tags (en → eng, ger → deu, null → und).
analyzer passes stream.profile (not title) to transcodeTarget so lossless
dts-hd ma in mkv correctly targets flac. is_noop also checks og-is-default and
canonical-language so pipeline-would-change-it cases stop showing as done.
normalizeLanguage gains 2→3 mapping, and mapStream no longer normalizes at
ingest so the raw jellyfin tag survives for the canonical check.
per-item scan work runs in a single db.transaction for large sqlite speedups,
extracted into server/services/rescan.ts so execute.ts can reuse it.
on successful job, execute calls jellyfin /Items/{id}/Refresh, waits for
DateLastRefreshed to change, refetches the item, and upserts it through the
same pipeline; plan flips to done iff the fresh streams satisfy is_noop.
schema wiped + rewritten to carry jellyfin_raw, external_raw, profile,
bit_depth, date_last_refreshed, runtime_ticks, original_title, last_executed_at
— so future scans aren't required to stay correct. user must drop data/*.db.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two regressions from the radarr/sonarr fix:
1. ERR_INVALID_URL spam — when radarr_enabled='1' but radarr_url is empty
or not http(s), every per-item fetch threw TypeError. We caught it but
still ate the cost (and the log noise) on every movie. New isUsable()
check on each service: enabled-in-config but URL doesn't parse →
warn ONCE and skip arr lookups for the whole scan.
2. Per-item HTTP storm — for movies not in Radarr's library we used to
hit /api/v3/movie (the WHOLE library) again per item, then two
metadata-lookup calls. With 2000 items that's thousands of extra
round-trips and the scan crawled. Now: pre-load the Radarr/Sonarr
library once into Map<tmdbId,..>+Map<imdbId,..>+Map<tvdbId,..>,
per-item lookups are O(1) memory hits, and only the genuinely-missing
items make a single lookup-endpoint HTTP call.
The startup line now reports the library size:
External language sources: radarr=enabled (https://..., 1287 movies in library), sonarr=...
so you can immediately see whether the cache loaded.
The real reason 8 Mile landed as Turkish: Radarr WAS being called, but the
call path had three silent failure modes that all looked identical from
outside.
1. try { … } catch { return null } swallowed every error. No log when
Radarr was unreachable, when the API key was wrong, when HTTP returned
404/500, or when JSON parsing failed. A miss and a crash looked the
same: null, fall back to Jellyfin's dub guess.
2. /api/v3/movie?tmdbId=X only queries Radarr's LIBRARY. If the movie is
on disk + in Jellyfin but not actively managed in Radarr, returns [].
We then gave up and used the Jellyfin guess.
3. iso6391To6392 fell back to normalizeLanguage(name.slice(0, 3)) for any
unknown language name — pretending 'Mandarin' → 'man' and 'Flemish' →
'fle' are valid ISO 639-2 codes.
Fixes:
- Both services: fetchJson helper logs HTTP errors with context and the
url (api key redacted), plus catches+logs thrown errors.
- Added a metadata-lookup fallback: /api/v3/movie/lookup/tmdb and
/lookup/imdb for Radarr, /api/v3/series/lookup?term=tvdb:X for Sonarr.
These hit TMDB/TVDB via the arr service for titles not in its library.
- Expanded NAME_TO_639_2: Mandarin/Cantonese → zho, Flemish → nld,
Farsi → fas, plus common European langs that were missing.
- Unknown name → return null (log a warning) instead of a made-up 3-char
code. scan.ts then marks needs_review.
- scan.ts: per-item warn when Radarr/Sonarr miss; per-scan summary line
showing hits/misses/no-provider-id tallies.
Run a scan — the logs will now tell you whether Radarr was called, what
it answered, and why it fell back if it did.
Two bugs compounded:
1. extractOriginalLanguage() in jellyfin.ts picked the FIRST audio stream's
language and called it 'original'. Files sourced from non-English regions
often have a local dub as track 0, so 8 Mile with a Turkish dub first
got labelled Turkish.
2. scan.ts promoted any single-source answer to confidence='high' — even
the pure Jellyfin guess, as long as no second source (Radarr/Sonarr)
contradicted it. Jellyfin's dub-magnet guess should never be green.
Fixes:
- extractOriginalLanguage now prefers the IsDefault audio track and skips
tracks whose title shouts 'dub' / 'commentary' / 'director'. Still a
heuristic, but much less wrong. Fallback to the first track when every
candidate looks like a dub so we have *something* to flag.
- scan.ts: high confidence requires an authoritative source (Radarr/Sonarr)
with no conflict. A Jellyfin-only answer is always low confidence AND
gets needs_review=1 so it surfaces in the pipeline for manual override.
- Data migration (idempotent): downgrade existing plans backed only by the
Jellyfin heuristic to low confidence and mark needs_review=1, so users
don't have to rescan to benefit.
- New server/services/__tests__/jellyfin.test.ts covers the default-track
preference and dub-skip behavior.
Bug: every approve path (buildCommand used by review approve/approve-all/
series approve-all/season approve-all/retry/detail preview) was building
an ffmpeg command that -map'd only the 'keep' streams and dropped all
subtitles. For a file like Wuthering Heights with 37 embedded subs, the
run would delete every sub into the void — user expected extraction to
sidecar files per the pipeline contract.
buildPipelineCommand already did the right thing (extract every subtitle
with -map 0:s:N -c:s copy 'basename.lang.srt', then remux kept streams)
but it was only reached by tests. buildCommand now delegates to it — one
call site, subtitle extraction always runs, predictExtractedFiles records
the sidecar paths after job success (same logic, same basePath).
Added a regression test: buildCommand on a 2-subtitle file contains
-map 0:s:0, -map 0:s:1 and the expected 'basename.en.srt'/'.de.srt' paths.
Subtitle extraction lives only in the pipeline now; a file is 'done' when it
matches the desired end state — no embedded subs AND audio matches the
language config. The separate Extract page was redundant.
- delete src/routes/review/subtitles/extract.tsx + SubtitleExtractPage
- delete /api/subtitles/extract-all + /:id/extract endpoints
- delete buildExtractOnlyCommand + unused buildExtractionOutputs from ffmpeg.ts
- detail page: drop Extract button + extractCommand textarea, replace with
'will be extracted via pipeline' note when embedded subs present
- pipeline endpoint: doneCount = is_noop OR status='done' (a file in the
desired state, however it got there); UI label 'N files in desired state'
- nav: drop the now-defunct 'Extract subs' link, default activeOptions.exact
to false so detail subpages (e.g. /review/audio/123) highlight their
parent ('Audio') in the menu — was the cause of the broken-feeling menu
- execute: actually call isInScheduleWindow/waitForWindow/sleepBetweenJobs in runSequential (they were dead code); emit queue_status SSE events (running/paused/sleeping/idle) so the pipeline's existing QueueStatus listener lights up
- review: POST /:id/retry resets an errored plan to approved, wipes old done/error jobs, rebuilds command from current decisions, queues fresh job
- scan: dev-mode DELETE now also wipes jobs + subtitle_files (previously orphaned after every dev reset)
- biome: migrate config to 2.4 schema, autoformat 68 files (strings + indentation), relax opinionated a11y/hooks-deps/index-key rules that don't fit this codebase
- routeTree.gen.ts regenerated after /nodes removal
- analyzer: rewrite checkAudioOrderChanged to compare actual output order, unify assignTargetOrder with a shared sortKeptStreams util in ffmpeg builder
- review: recompute is_noop via full audio removed/reordered/transcode/subs check on toggle, preserve custom_title across rescan by matching (type,lang,stream_index,title), batch pipeline transcode-reasons query to avoid N+1
- validate: add lib/validate.ts with parseId + isOneOf helpers; replace bare Number(c.req.param('id')) with 400 on invalid ids across review/subtitles
- scan: atomic CAS on scan_running config to prevent concurrent scans
- subtitles: path-traversal guard — only unlink sidecars within the media item's directory; log-and-orphan DB entries pointing outside
- schedule: include end minute in window (<= vs <)
- db: add indexes on review_plans(status,is_noop), stream_decisions(plan_id), media_items(series_jellyfin_id,series_name,type), media_streams(item_id,type), subtitle_files(item_id), jobs(status,item_id)
- remove nodes table, ssh service, nodes api, NodesPage route
- execute.ts: local-only spawn, atomic CAS job claim via UPDATE status
- wrap job done + subtitle_files insert + review_plans status in db transaction
- stream ffmpeg output per line with 500ms throttled flush
- bump version to 2026.04.13
- server-side filter + LIMIT 200 + totalCounts on GET /api/execute
- shared FilterTabs component with status-colored active tabs
- execute page: filter tabs, SSE live count updates, module-level cache
- replace inline tab pills in AudioListPage, SubtitleListPage with FilterTabs
- fix buildExtractOnlyCommand: skip -map 0:a when no audio streams exist
- bump version
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
rewrite from monolithic hono jsx to react 19 spa with tanstack router
+ hono json api backend. add scan, review, execute, nodes, and setup
pages. multi-stage dockerfile (node for vite build, bun for runtime).
previously, server/ and src/shared/lib/ were silently excluded by
global gitignore patterns (/server/ from emacs, lib/ from python).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>