Jump to content

ICT:FinalConfig - Asset v2.1 - private extension

From Costa Sano MediaWiki

Final Configuration for the Asset Entity – v2.1 - server-side handling

(Automatic Identification, Provenance Capture, File Normalization)

Document revision: 2026-02-22 by Mngr.

This document defines the intermediate configuration for the Asset entity (version 2.1). It supersedes all previous drafts and experimental implementations.


0. Runtime Baseline (LOCKED)

The following environment is assumed and MUST NOT change during Asset v2.x implementation:

  • OS: AlmaLinux 10.1 with * PHP: 8.3
  • MediaWiki: 1.45.x running on AlmaLinux 10.1 Virtual Machine 1.
  • DocumentRoot: /var/ww/mediawiki
  • MariaDB and Cargo running on AlmaLinux 10.1 Virtual Machine 2
  • Virtual Machines are running on a Windows 11 Hyper-V with 8 virtual-cores and 16 GB of central memory to be moved to a windows server 2019 environment with Hyper-V.
  • Page Forms: 6.x
  • Scribunto (Lua 5.1)
  • Reverse Proxy using IIS on Windows Server 2019
  • Redirects defined in Vhost
  • Site URL https://mwiki.costasano.club
  • VScode used to access the wiki files on Virtual Machine 1 and develop the database model locally on PC.
  • Enhanced security defined in LocalSystems.php as the wiki is used in a private setting.

Remark Lua 5.4 is explicitly excluded being incompatible with the precious setup.


1. Asset Concept (Normative)

An Asset represents the historical metadata of exactly one digital file.

Rules:

  • One Asset = one File
  • The Asset identifier and the File name are identical (except extension)
  • The Asset/File pair is immutable after creation.
  • Only specific informative fields are editable after first creation.
  • Assets may reference parent Assets (derivatives, OCR, AI output, annexes)
  • Deletion of a pair is only permitted by sysop.

The Asset identifier is the archival reference and MUST NEVER change once created.


2. Identifier Scheme (Normative)

Asset identifiers follow this pattern:

<ChapterCode>-<ContextCode>-<SequenceNumber>

Where:

  • ChapterCode comes from Chapter.Code
  • ContextCode comes from either Place.Code OR Organisation.Code
  • SequenceNumber is a 4-digit integer

Example:

CH03-BER-0007

Rules:

  • Sequence numbers are computed automatically
  • Gaps are allowed in the numbering
  • Numbers are never reused
  • Contiguity is NOT guaranteed
  • Counter fields are available in Chapter, Place, Organisation, Asset whatever is useful

3. Draft Asset Strategy (Mandatory)

Form Pages does not allow the creating of a Form without giving a pagename. However, the pagename in our workflow is based on some user inputs in the form. The solution is to start with a draft Asset creation using a timestamp naming because tests showed that using a fixed name is not feasible in our setup due to MediaWiki behavior.

3.1 Draft Page Workflow

The draft page exist after creation and needs to be filled in, migrated (moved) to a final version and should not survive.

  • All new Assets are created with a timestamp as initial name
  • This initial draft page itself is never queried or displayed in dashboards.
  • The page is "moved" to its final identifier when the user decides to save
  • The file uploaded during the draft phase is also "moved" to use exactly te same automatic code as the asset it is linked to. The original name will be stored in the OrignalFilename field for later reference.
  • Saving the page should not be allowed if the File is missing and the Code elements not chosen.

3.2 Implementation strategy

It has been decided to realize the above requirement server-side and as such a private extension is to be created. Lua and Javascript are not forbidden if for one reason this might help the design or be needed to overcome MediaWiki 1.45 limits. It has been observed that AlmaLinux 10.1 with php 8.3 and MediaWiki 1.453 are all very new and documented is limited which makes it difficult to desing a working setup. A lot of trial and error is needed to detect by inspection code how these hooks behave. The design should also take care to look for a robust solution which need to last for a decade at least and survive software updates of all sorts.


4. Cargo Table and Template

4.1 Template:Asset

<noinclude>
Asset data template (v2)

{{#cargo_declare:
 |_table=Assets
 |Code=Page
 |Label=String
 |Chapter=Page
 |Place=Page
 |Organisation=Page
 |Sequence=Integer
 |File=String
 |OriginalFilename=String
 |Parent=Page
 |AssetType=Page
 |Description=Text
 |Notes=Text
}}
</noinclude>

{{#cargo_store:
 |Code={{FULLPAGENAME}}
 |Label={{{Label|}}}
 |Chapter={{{Chapter|}}}
 |Place={{{Place|}}}
 |Organisation={{{Organisation|}}}
 |Sequence={{{Sequence|}}}
 |File={{{File|}}}
 |OriginalFilename={{{OriginalFilename|}}}
 |Parent={{{Parent|}}}
 |AssetType={{{AssetType|}}}
 |Description={{{Description|}}}
 |Notes={{{Notes|}}}
}}

== {{{Label|}}} ==
{{DISPLAYTITLE:{{{Label}}}}}

{{#if:{{{File|}}}|'''File:''' [[{{{File}}}]]}}

{{#if:{{{OriginalFilename|}}}|'''Original filename:''' {{{OriginalFilename}}}}}

{{#if:{{{Parent|}}}|'''Parent asset:''' [[{{{Parent}}}]]}}

'''Description:'''
{{{Description|}}}

'''Notes:'''
{{{Notes|}}}

4.2 Cargo Setup

After saving the template:

  • Go to Special:CargoTables
  • Manually create the table Assets

5. Page Form

5.1 Form:Asset

<noinclude>
Form for creating and editing Asset pages (v2)
</noinclude>

{{{info
|page name=Asset:<Asset[Label]>
|no summary
|no preview
|no minor edit
|no watch
|no footer
}}}

{{{for template|Asset}}}

{| class="formtable"

! Chapter (*)
| {{{field|Chapter
 |input type=combobox
 |values from namespace=Chapter
 |existing values only
}}}
|-

! Place
| {{{field|Place
 |input type=combobox
 |values from namespace=Place
 |existing values only
 |placeholder=Use Organisation instead
}}}
|-

! Organisation
| {{{field|Organisation
 |input type=combobox
 |values from namespace=Organisation
 |existing values only
 |placeholder=Use Place instead
}}}
|-

! Identifier
| {{{field|Label|readonly}}}
|-

! Sequence
| {{{field|Sequence|readonly}}}
|-

! File
| {{{field|File
 |input type=page
 |namespace=File
 |uploadable=yes
}}}
|-

! Original filename
| {{{field|OriginalFilename|readonly}}}
|-

! Asset type
| {{{field|AssetType
 |input type=combobox
 |values from namespace=AssetType
 |existing values only
}}}
|-

! Parent asset
| {{{field|Parent
 |input type=combobox
 |values from namespace=Asset
 |existing values only
 |placeholder=Top level
}}}
|-

! Description
| {{{field|Description|input type=textarea}}}
|-

! Notes
| {{{field|Notes|input type=textarea}}}

|}

{{{standard input|save}}}

{{{end template}}}

The "move" of the Aset itself looks more difficult to realize than the move of the File. Therefore

{{{info
|page name=Asset:<Asset[Label]>

has been introduced in the "info" section of Form:Asset, but without much effect for now.


6. Dashboard

6.1 Dashboard:Asset

= 🗂️ Asset Dashboard =

{| class="wikitable sortable"
! Code !! Chapter !! Place !! Organisation !! File !! Type
{{#cargo_query:
 tables=Assets
 |fields=_pageName,_pageTitle,Chapter,Place,Organisation,File,AssetType
 |where=_pageName!='Asset:DRAFT'
 |order by=_pageTitle
 |format=template
 |template=AssetRow
 |named args=yes
 |cache=no
}}
|}
{{#formlink:
 form=Asset

 |target=Asset:{{#time: YmdHis }}
 |link text=➕ New Asset
 |link type=button

 |query string=Asset[Label]=GENERATING...&Asset[File]=
 |returnto=Dashboard:Asset
}}

Adding a new Asset now uses "formlink" instead of "forminput". In our case we use a timestamp as pagename and a first Label is used for the draft: GENERATING.. This strategy is important as we can not delegate any "fake" input to the user. A timestamp has been choosen as MediaWiki "remembers" somewhat too well the used names of the pages. A feature which is welcome in a document environment but not in a heritage data collection and collaboration system. With a timestamp name, each fake page is different. Once the system is fully functional, one should look if cleanup of these fake names is not needed to avoid too much clutter in the system.

6.2 Template:AssetRow

<includeonly>
|-
| {{#formlink:
   form=Asset
   |target={{{_pageName}}}
   |link text={{{_pageTitle}}}
   |returnto=Dashboard:Asset
 }}
| {{{Chapter}}}
| {{{Place}}}
| {{{Organisation}}}
| {{{File}}}
| {{{AssetType}}}
</includeonly>

AssetRow helps for time being displaying a table with existing Assets ready to be edited. This is a first version for development reason. As the number of Assets will increase, another display/edit solution needs to be developed.


7. Private Extension AssetLifecycle

7.1 LocalSystems.php loading

   wfLoadExtension( 'AssetLifecycle' );

7.2 Extension.json

{
    "manifest_version": 2,
    "name": "AssetLifecycle",
    "version": "1.0",
    "author": "Heritage Research Club",
    "description": "Automated labeling for Heritage Assets",
    "AutoloadClasses": {
        "LegacyHooks": "LegacyHooks.php"
    },
    "Hooks": {
        "PageForms::WritePageData": "LegacyHooks::onPFWritePageData",
        "PageForms::SetTargetName": "LegacyHooks::onPFSetTargetName"
    }
}

The file is mandatory and the information inside is needed to notify to MediaWiki the existance of the extension and providing access to the extension code. In oder to foresee for a robust design, For time being, legacy hooks have been choosen to fulfill the requirements.

7.2 LegacyHooks.php

<?php

use MediaWiki\MediaWikiServices;
use MediaWiki\Title\Title;
use MediaWiki\Context\RequestContext;

class LegacyHooks {

    /**
     * Hook: PageForms::WritePageData
     */
    public static function onPFWritePageData( $formName, $title, &$wikitext ) {
        $logFile = '/var/www/mediawiki/hook_debug.log';
        $time = date('H:i:s');

        if ( !is_string( $wikitext ) ) return true;

        // 1. Extract values using multiline regex with flexible whitespace
        preg_match( '/^\|\s*Chapter\s*=\s*(.*?)\s*$/m', $wikitext, $matchCh );
        preg_match( '/^\|\s*Place\s*=\s*(.*?)\s*$/m', $wikitext, $matchPl );
        preg_match( '/^\|\s*Organisation\s*=\s*(.*?)\s*$/m', $wikitext, $matchOrg );
        preg_match( '/^\|\s*File\s*=\s*(.*?)\s*$/m', $wikitext, $matchFile );
        
        $chapter = isset($matchCh[1]) ? trim($matchCh[1]) : '';
        $place = isset($matchPl[1]) ? trim($matchPl[1]) : '';
        $org = isset($matchOrg[1]) ? trim($matchOrg[1]) : '';
        $oldFile = isset($matchFile[1]) ? trim($matchFile[1]) : '';

        if ( empty( $chapter ) ) return true;

        try {
            // 2. Query External DB (10.10.10.2)
            global $wgCargoDBserver, $wgCargoDBname, $wgCargoDBuser, $wgCargoDBpassword, $wgCargoDBtype;
            $dbFactory = MediaWikiServices::getInstance()->getDatabaseFactory();
            $db = $dbFactory->create( $wgCargoDBtype, [
                'host' => $wgCargoDBserver, 'user' => $wgCargoDBuser, 'password' => $wgCargoDBpassword, 'dbname' => $wgCargoDBname
            ]);

            // Query MAX Sequence
            $row = $db->selectRow('Assets', ['MAX(Sequence) AS m'], ['Chapter' => $chapter], __METHOD__);
            $nextSeq = ( $row && $row->m ) ? (int)$row->m + 1 : 1;
            $db->close();

            // 3. Generate Label
            $cleanCh = preg_replace('/^.*?:/', '', $chapter);
            $subVal = !empty($place) ? $place : $org;
            $cleanSub = preg_replace('/^.*?:/', '', $subVal);
            $newLabel = sprintf("%s-%s-%04d", $cleanCh, $cleanSub, $nextSeq);

            // 4. Update OriginalFilename (Crucial: Replace the whole line including potential whitespace)
            if ( !empty($oldFile) ) {
                $wikitext = preg_replace('/^\|\s*OriginalFilename\s*=.*$/m', "|OriginalFilename=$oldFile", $wikitext);
                
                $ext = pathinfo($oldFile, PATHINFO_EXTENSION);
                $newFileName = "$newLabel.$ext";
                
                // Update File field
                $wikitext = preg_replace('/^\|\s*File\s*=.*$/m', "|File=$newFileName", $wikitext);
                
                // 5. Physically rename the file
                self::moveUploadedFile( "File:$oldFile", "File:$newFileName" );
            }

            // 6. Update Label and Sequence
            $wikitext = preg_replace('/^\|\s*Label\s*=.*$/m', "|Label=$newLabel", $wikitext);
            $wikitext = preg_replace('/^\|\s*Sequence\s*=.*$/m', "|Sequence=$nextSeq", $wikitext);

            file_put_contents($logFile, "[$time] SUCCESS: Updated $newLabel. Original: $oldFile\n", FILE_APPEND);

        } catch ( \Exception $e ) {
            file_put_contents($logFile, "[$time] ERROR: " . $e->getMessage() . "\n", FILE_APPEND);
        }

        return true;
    }

    private static function moveUploadedFile( $oldName, $newName ) {
        if ( empty($oldName) || $oldName === $newName || $oldName === "File:" ) return;
        $oldTitle = Title::newFromText( $oldName );
        $newTitle = Title::newFromText( $newName );
        if ( $oldTitle && $newTitle && $oldTitle->exists() ) {
            $user = RequestContext::getMain()->getUser(); // Use current user for simplicity in this step
            $movePage = MediaWikiServices::getInstance()->getMovePageFactory()->newMovePage( $oldTitle, $newTitle );
            $movePage->move( $user, "Heritage Auto-Rename", false );
        }
    }

    /**
     * SetTargetName: We force the name here to ensure the Page Move happens.
     */
    public static function onPFSetTargetName( $formName, &$target_name, $wikitext = null ) {
        if ( is_string($wikitext) && preg_match('/^\|\s*Label\s*=\s*(.*)$/m', $wikitext, $m) ) {
            $label = trim($m[1]);
            if ( !empty($label) && $label !== 'GENERATING...' ) {
                $target_name = "Asset:" . $label;
            }
        }
        return true;
    }
}

It has been experimentally detected that the information about the Form is exchanged in wikitext where in older versions of MediaWiki it was done in an array format. This wikitext format is less structured than an array, and only has the fields inside with content. Other fields without content are omitted. This makes it harder to get to the information needed for the procedure.

7.4 Extension structure

This is a simple extension, not in any way comparable to for example Cargo or VisualEditor. Therefore only 2 files are needed. The mandatory extension.json with a mandatory layout, and then of course the handler written in php, alike the whole MediaWiki application. The biggest issue was to verify experimentally what hooks are available in the software versions used from inside Page Forms and how these behave.

The 2 files are located as follows.

 /var/www/mediawiki/extensions/AssetLifecycle/extension.json
 /var/www/mediawiki/extensions/AssetLifecycle/LegacyHooks.php

8. Immutability Rules

After a successful save, the following fields MUST NOT change:

  • Asset identifier (page name)
  • Label
  • Chapter
  • Place / Organisation
  • File
  • OriginalFilename

Other metadata MAY change.

Deletion of an Asset/File pair is sysop-only.


9. Status

Asset v2.1 is partially functional

File upload and move: OK Asset move: NOK Automatic numbering combination: OK for Chapter-Place-Number (Organisation not tested yet) Automatic counter increment: NOK OriginalFIleName filled in: NOK Identifier filled in: OK

Further work and testing needs to be done.

10. Appendix - testing available hooks in a simple way

LocalSystems.php can be used as a test platform with addational php snippets at the bottom of the file. Here is an example, testing some hooks and reporting in a logfine. The used logfile in this example is called hook_debug.log and is located in the DocumentRoot of MediaWIki to avoid extra complexity. AlmaLinux 10.1 is very strict in allowng access to files in the system, and our DocumentRoot is of course allowed territory.

The snippet supposes that the log files exists before being executed. Execution of the LocalSystems.php, including the snippet is by opening the MediaWIki website with one or another page.

Snippet

/** 
 * Heritage Project Hook Test
 * Log location: /var/www/mediawiki/hook_debug.log
 */
$logFile = __DIR__ . '/hook_debug.log';

// Generic logger function
$mwTestLogger = function ( $hookName ) use ( $logFile ) {
    $time = date('Y-m-d H:i:s');
    $entry = "[$time] SUCCESS: Hook '$hookName' is active.\n";
    file_put_contents( $logFile, $entry, FILE_APPEND );
    return true;
};

// 1. Test Core Hook (Guaranteed to fire on any page save)
$wgHooks['PageSaveComplete'][] = function() use ($mwTestLogger) { 
    return $mwTestLogger('PageSaveComplete'); 
};

// 2. Test PageForms Hook: Triggered before data is written
$wgHooks['PageForms::WritePageData'][] = function() use ($mwTestLogger) { 
    return $mwTestLogger('PageForms::WritePageData'); 
};

// 3. Test PageForms Hook: Triggered to determine the target page name
$wgHooks['PageForms::SetTargetName'][] = function() use ($mwTestLogger) { 
    return $mwTestLogger('PageForms::SetTargetName'); 
};

// 4. Test PageForms Hook: Triggered when rendering the form
$wgHooks['PageForms::FormPrinterSetup'][] = function() use ($mwTestLogger) { 
    return $mwTestLogger('PageForms::FormPrinterSetup'); 
};

The execution of this snippet showed traces in the logfile demonstrating that these Hooks are all working in our system.

The second example is a so called inspector which is put in the LegacyHooks.php of our private extension to detect how parameters are passed.

<?php

use Extension\Cargo\CargoUtils;
use MediaWiki\MediaWikiServices;
use MediaWiki\Title\Title;

class LegacyHooks {

    public static function onPFWritePageData() {
        $logFile = '/var/www/mediawiki/hook_debug.log';
        $time = date('H:i:s');
        $args = func_get_args();
        
        file_put_contents($logFile, "[$time] --- NEW SAVE ATTEMPT ---\n", FILE_APPEND);

        foreach ( $args as $index => $arg ) {
            $type = gettype($arg);
            if ( $type === 'object' ) {
                $class = get_class($arg);
                file_put_contents($logFile, "[$time] Arg $index is Object: $class\n", FILE_APPEND);
                // List properties to find the data container
                $vars = implode(', ', array_keys(get_object_vars($arg)));
                file_put_contents($logFile, "[$time]   Properties: $vars\n", FILE_APPEND);
            } else {
                file_put_contents($logFile, "[$time] Arg $index is $type: " . print_r($arg, true) . "\n", FILE_APPEND);
            }
        }
        return true;
    }
}

With this code it was discovered (via the log) that the parameter past is in wikitext format instead of an array what is a serious change from the older approach.