Data Modelling & Media Management Architecture

This page documents the architectural decisions and working principles used for structuring data, relationships, and digital media in this MediaWiki installation.

It reflects conclusions reached during the design phase and serves as a reference for future development and maintenance.

Purpose

The goal of this architecture is to:

Support complex research data with many relationships
Separate internal research work from public presentation
Manage digital media (files, derivatives, metadata) in a controlled and scalable way
Avoid schema drift and ad-hoc solutions
Remain compatible with MediaWiki core and extensions

Core Technologies

The system is based on the following components:

Cargo – authoritative data storage (database schema)
Page Schemas – page structure and data-entry UI
Forms – controlled data input
Namespaces – separation of concerns and access control
File namespace – physical file storage (MediaWiki core)

Each component has a clearly defined responsibility.

Authority and Responsibility

Component	Responsibility
Cargo tables	Define and own the database schema
Page Schemas	Define page types and map fields to Cargo
Forms	Control how editors enter data
Templates	Store data via Cargo
Database (MySQL)	Implementation detail only

Important: Database tables must not be created or modified directly in MySQL. The Cargo table pages are the single source of truth for schema definition.

Workflow Principle

The general workflow is:

Design the model (DBML / diagrams)
Create or update Cargo tables
Activate Cargo tables by saving the Cargo page
Create Page Schemas using the UI
Generate forms
Enter and test real data
Iterate carefully

Namespaces and Their Roles

Namespaces are used to separate concerns and control access.

Namespace	Purpose	Visibility
(Main)	Public research results and narratives	Public
HO:	Structured Heritage Objects	Club members
DA:	Digital Asset metadata and relationships	Club members
File:	Physical file storage	Uploads restricted
ICT:	Technical and architectural documentation	Club members

Namespaces must not be redefined. The built-in File: namespace remains unchanged.

File Management Strategy

MediaWiki’s file model is page-centric and limited for complex workflows. To address this, a clear separation is enforced:

File: pages store physical files only
DA: pages store semantic metadata about digital assets

Digital Assets (DA) act as the central abstraction layer.

Internal vs Public Files

Not all files are equal:

Internal files:
- High-resolution masters
- OCR outputs
- Working derivatives
Public files:
- Curated, approved derivatives
- Downscaled or watermarked versions

Internal and public files are managed by convention, not by redefining namespaces.

Upload Permissions

Uploading files is restricted to club members
Viewing files remains public (to support public pages)
Editors must ensure that only approved files are embedded on public pages

Public pages must never depend on internal-only files.

Digital Assets (DA)

A Digital Asset represents a conceptual media object and may reference:

One or more File: pages
A parent Digital Asset (for derivatives)
One or more Heritage Objects

DA pages are internal and never directly exposed to the public.

Handling Many-to-Many Relationships

Relational join tables from the original data model are implemented as:

Cargo subtables
Repeatable sections in Page Schemas

Example use cases:

Object–Person relationships
Object–Set memberships
Provenance chains

This avoids page explosion and keeps relationships contextual.

Page Schemas vs Cargo Tables

Cargo tables may be created in two ways:

Manually (Cargo-first)
Via Page Schemas UI (UI-first)

In both cases:

The Cargo table page must be saved manually to activate the database table
Page Schemas do not execute database changes automatically

For complex, stable models, a Cargo-first approach is preferred.

Design Principles

The following principles guide all future development:

Cargo owns structure, Page Schemas own usability
Internal complexity is allowed; public simplicity is required
Publication is a controlled act, not a permission toggle
Add fields freely; rename or remove fields carefully
Prefer conventions over hacks
Avoid direct database manipulation

Status

This architecture is considered the current baseline.

Changes must be documented and reviewed before implementation.