# `PhoenixKit.Migrations.Postgres.V111`
[🔗](https://github.com/BeamLabEU/phoenix_kit/blob/v1.7.164/lib/phoenix_kit/migrations/postgres/v111.ex#L1)

V111: PDF library tables for the catalogue module.

Backs the "PDFs" subtab in `phoenix_kit_catalogue`, layered on top
of core's `phoenix_kit_files` for binary storage / dedup / soft-delete
/ multi-bucket redundancy. Catalogue owns only the per-page text
index and the user-facing per-upload row.

## Tables

- `phoenix_kit_cat_pdfs` — thin per-upload row. One row per
  "user uploaded this name". `file_uuid` FK → `phoenix_kit_files.uuid`
  `ON DELETE RESTRICT` (catalogue manages the lifecycle; core
  prune can't remove a file referenced by a live catalogue row).
  Two uploads of identical content (different filenames) → two
  `phoenix_kit_cat_pdfs` rows, one shared `phoenix_kit_files` row,
  one shared extraction.
  Soft-delete via `status` sentinel `"active"` / `"trashed"`
  (workspace convention) plus `trashed_at` for trashed-at age UI.

- `phoenix_kit_cat_pdf_extractions` — keyed by `file_uuid` PK
  (one row per unique PDF content). Holds the worker's state
  machine (`pending → extracting → extracted | scanned_no_text |
  failed`), `page_count`, `extracted_at`, `error_message`.
  Cascades on the file row's hard delete.

- `phoenix_kit_cat_pdf_page_contents` — content-addressed dedup
  cache. Keyed by `content_hash` (SHA-256 hex of the page's
  normalized text). Same page text across multiple PDFs (boilerplate,
  legal disclaimers, cross-referenced product entries) is stored
  once. The GIN trigram index on `text` lives here, so the search
  index doesn't grow with duplication.

- `phoenix_kit_cat_pdf_pages` — per-page join. Composite PK
  `(file_uuid, page_number)`. References both the file (cascade on
  file delete) and the page-content cache (restrict; orphaned content
  rows are GC'd by a catalogue-side helper, not by FK cascade, so
  the cache doesn't churn during normal upload/delete cycles).

Enables `pg_trgm` for the trigram index.

# `down`

# `up`

---

*Consult [api-reference.md](api-reference.md) for complete listing*
