Jump to content

ICT:Drupal internals

From Costa Sano MediaWiki

Drupal Internals for Embedded-System Thinkers

This document provides a behavioral understanding of Drupal 11 by mapping its internal mechanisms to concepts from embedded systems and real-time software engineering. It is designed as a reference for system-level thinking, not as a click-through tutorial.

1. Embedded Systems ↔ Drupal 11: Behavioral Comparison Table

The following table maps familiar embedded concepts to their Drupal equivalents, focusing on *behavior* rather than API details.

Embedded Systems Concept Drupal 11 Equivalent Behavioral Insight
Bootloader Drupal Kernel bootstrap Drupal “boots” fresh on every request; no persistent runtime.
Firmware Drupal core + modules Static codebase; behavior emerges per request.
Tasks / Threads PHP-FPM worker processes Parallelism exists, but workers do not share memory.
Shared memory Database + cache backend Only persistent state is external (DB, Redis, etc.).
Mutex / Semaphore Drupal Lock API Mutual exclusion only when explicitly requested.
Atomic operations MySQL transactions / SELECT FOR UPDATE Not automatic; must be implemented manually.
ISR (Interrupt Service Routine) Event subscribers Events fire during the request lifecycle.
Polling loop / scheduler Symfony kernel event pipeline Deterministic event order, similar to a processing pipeline.
Device drivers Plugins Pluggable, discoverable components with defined interfaces.
Hardware abstraction layer (HAL) Service container Dependency injection, lazy loading, orchestration of services.
EEPROM / Flash Configuration system (YAML + config entities) Persistent, versionable, environment-specific settings.
Sensor data Entities + fields Structured, typed data with storage abstraction.
Real-time constraints None Drupal is not real-time; latency is acceptable.
Deterministic scheduling None Request handling is non-deterministic across workers.
Multi-core concurrency Multi-process concurrency Parallelism exists, but without shared memory or locks.
Race conditions Possible during simultaneous writes Must be handled explicitly if needed.

2. Drupal Request Lifecycle (Behavioral Model)

Drupal behaves like a system that reboots for every request. Understanding this lifecycle is essential for designing predictable, maintainable modules.

2.1 Request Arrival

  • A new HTTP request arrives.
  • A PHP-FPM worker process is assigned to handle it.

Embedded analogy: An interrupt wakes a CPU core.

2.2 Drupal Kernel Boot

Drupal initializes:

  • Autoloader
  • Service container
  • Module discovery
  • Event subscribers
  • Routing system

Embedded analogy: Bootloader + HAL initialization.

Relevant directories:

  • core/lib/Drupal/Core/DrupalKernel.php
  • core/lib/Drupal/Core/DependencyInjection/

2.3 Routing Resolution

Drupal matches the URL to:

  • a route
  • a controller or form handler
  • access checks

Embedded analogy: ISR vector table lookup.

Relevant directories:

  • core/lib/Drupal/Core/Routing/
  • core/modules/system/system.routing.yml

2.4 Controller or Form Execution

Your module code runs:

  • services are instantiated
  • entities are loaded
  • business logic executes

Embedded analogy: Main control loop execution.

2.5 Rendering Pipeline

Drupal builds the response:

  • render arrays bubble upward
  • cache metadata is collected
  • HTML is generated

Embedded analogy: Graphics pipeline assembling a frame.

Relevant directories:

  • core/lib/Drupal/Core/Render/
  • core/lib/Drupal/Core/Theme/

2.6 Response Sent & Process Ends

  • Response is returned to the browser.
  • PHP process ends.
  • All memory is wiped.

Embedded analogy: CPU core returns from interrupt; registers cleared.

3. Drupal Subsystems Explained (Mental Models for System Engineers)

3.1 Service Container (HAL + Dependency Graph)

Drupal’s service container is a dependency injection system.

Embedded analogy: A Hardware Abstraction Layer (HAL) that:

  • exposes well-defined interfaces
  • hides implementation details
  • provides lazy-loaded components
  • centralizes system services

Behavior:

  • Services are instantiated per request.
  • No shared memory between requests.
  • Dependencies are resolved deterministically.

Relevant directories:

  • core/lib/Drupal/Core/DependencyInjection/
  • core/lib/Drupal/Component/DependencyInjection/

Design implications:

  • Your modules should depend on services, not global state.
  • Constructor injection improves testability and clarity.

Successor-friendly notes:

  • Document custom services in my_module.services.yml.

---

3.2 Entity API (Typed Data + Storage Abstraction)

Entities represent structured data with fields.

Embedded analogy: Typed sensor data structures with:

  • schema definitions
  • type safety
  • storage abstraction
  • metadata

Behavior:

  • Entities are loaded from storage, not kept in memory.
  • Field values are typed and validated.
  • Storage controllers abstract database operations.

Relevant directories:

  • core/lib/Drupal/Core/Entity/
  • core/lib/Drupal/Core/Field/

Design implications:

  • Use entities for structured, reusable data.
  • Avoid direct SQL unless absolutely necessary.

Successor-friendly notes:

  • Keep entity definitions simple and well-documented.

---

3.3 Plugin System (Device Drivers)

Plugins are discoverable, swappable components.

Embedded analogy: Device drivers with:

  • defined interfaces
  • runtime discovery
  • interchangeable implementations

Behavior:

  • Plugins are discovered via annotations.
  • Plugin managers control instantiation.

Relevant directories:

  • core/lib/Drupal/Core/Plugin/
  • core/lib/Drupal/Component/Plugin/

Design implications:

  • Use plugins when you need extensibility.
  • Create custom plugin types for flexible architectures.

Successor-friendly notes:

  • Document plugin IDs and expected behavior.

---

3.4 Caching Layers (Memory Hierarchy)

Drupal uses layered caching to improve performance.

Embedded analogy: L1/L2 cache → RAM → Flash hierarchy.

Layers:

  • Render cache
  • Dynamic page cache
  • Entity cache
  • Routing cache
  • Config cache

Behavior:

  • Cache metadata bubbles up during rendering.
  • Cache invalidation is event-driven.

Relevant directories:

  • core/lib/Drupal/Core/Cache/
  • core/modules/dynamic_page_cache/

Design implications:

  • Always return cache metadata in render arrays.
  • Avoid disabling cache unless absolutely necessary.

Successor-friendly notes:

  • Document cache contexts, tags, and max-age.

---

3.5 Concurrency Model (Multi-Process Scheduling)

Drupal uses multi-process concurrency.

Embedded analogy: Multiple CPU cores running identical firmware, sharing only external memory.

Behavior:

  • Each request is a separate PHP process.
  • No shared memory.
  • No inherent locking.
  • Race conditions possible during simultaneous writes.

Relevant directories:

  • core/lib/Drupal/Core/Lock/
  • core/lib/Drupal/Core/Database/

Design implications:

  • Use the Lock API only when necessary.
  • Accept that SELECT does not lock.

Successor-friendly notes:

  • Document any locking strategy clearly.

---

3.6 Render Pipeline (Graphics Pipeline)

Drupal’s rendering system is multi-stage.

Embedded analogy: A GPU pipeline assembling a frame:

  • input → transformation → composition → output

Stages:

  • Build render arrays
  • Bubble cache metadata
  • Apply theming
  • Generate HTML

Relevant directories:

  • core/lib/Drupal/Core/Render/
  • core/lib/Drupal/Core/Theme/

Design implications:

  • Always return structured render arrays.
  • Avoid generating HTML in PHP.

Successor-friendly notes:

  • Document render array structure.

---

3.7 Configuration System (Persistent Flash Storage)

Drupal stores configuration in YAML files and config entities.

Embedded analogy: EEPROM/Flash:

  • persistent
  • versionable
  • environment-specific

Behavior:

  • Config is immutable at runtime.
  • Changes require explicit import/export.

Relevant directories:

  • core/lib/Drupal/Core/Config/
  • config/install/
  • config/schema/

Design implications:

  • Use config for settings, not content.
  • Keep config schemas clean and typed.

Successor-friendly notes:

  • Document config keys and their meaning.

---

3.8 Event System (ISR Dispatching)

Drupal uses Symfony’s event dispatcher.

Embedded analogy: Interrupt Service Routines:

  • triggered by system events
  • processed in priority order
  • allow modules to react to lifecycle stages

Behavior:

  • Events replace many old hooks.
  • They provide structured extension points.

Relevant directories:

  • core/lib/Drupal/Core/EventSubscriber/
  • core/lib/Drupal/Core/EventDispatcher/

Design implications:

  • Use events for cross-cutting concerns.
  • Keep subscribers small and focused.

Successor-friendly notes:

  • Document event priorities.

4. Why This Matters for System Design

Understanding Drupal’s behavior allows you to:

  • predict concurrency scenarios
  • understand why services are not shared
  • reason about caching and state
  • design successor-friendly modules
  • avoid treating Drupal as a black box

Drupal is a request-driven, stateless, multi-process system. Once this model is internalized, its behavior becomes predictable and architecturally coherent.

5. Summary

Drupal is not a monolithic, continuously running program. It is a stateless, request-driven framework built on:

  • independent PHP processes
  • a service container
  • an event-driven kernel
  • a pluggable architecture
  • externalized state (database + config)

This document provides a conceptual foundation for understanding Drupal’s internal behavior through the lens of embedded systems engineering.

6. How Drupal Stores Data: Entity–Field–Storage Architecture

Drupal’s storage model is one of its most misunderstood components. At first glance, the database appears sparse, fragmented, and filled with many small tables. This is intentional and reflects Drupal’s highly flexible, field-based content architecture.

This section explains how Drupal stores data internally, why the database looks the way it does, and how this architecture maps to embedded-system concepts.

6.1 Overview of the Storage Model

Drupal uses a three-layer architecture for storing structured data:

  1. **Entity** – the conceptual object (e.g., node, user, taxonomy term)
  2. **Field** – individual typed values attached to the entity
  3. **Storage** – how each field is persisted in the database

Embedded analogy: An entity is like a typed struct, and each field is a member of that struct. However, instead of storing the whole struct in one memory block, Drupal stores each member in a separate region.

This allows:

  • per-field translation
  • per-field revisioning
  • pluggable field types
  • flexible content modeling
  • typed data validation

6.2 Why the Database Looks “Empty”

When inspecting Drupal’s database, you may notice:

  • very few “entity tables”
  • many small tables named like node__field_xyz
  • metadata tables for files, images, and references

This is because Drupal does **not** store an entity as a single row. Instead, each field is stored in its own dedicated table.

Example:

A node with fields title, body, and field_image is stored as:

  • node_field_data – base entity metadata
  • node__title – title values
  • node__body – body values
  • node__field_image – image references

Drupal assembles the entity at runtime by loading and merging these tables.

Embedded analogy: A struct whose members are stored in separate memory regions because each member has different:

  • size
  • type
  • encoding
  • update frequency
  • translation requirements

6.3 Field Tables Explained

Each field table follows a consistent pattern:

entitytype__fieldname

Columns typically include:

  • entity_id
  • langcode
  • delta (for multi-value fields)
  • fieldname_value (or multiple typed columns)

Behavior:

  • Multi-value fields use delta to store multiple rows.
  • Translations create additional rows per language.
  • Revisions create additional rows per revision.

This explains why the database grows “horizontally” rather than “vertically.”

6.4 File Storage: Why So Many “Blobs”?

Drupal stores uploaded files on disk, not in the database. However, the database stores:

  • file metadata
  • usage tracking
  • references from entities
  • alternative text and titles (for images)

Relevant tables:

  • file_managed
  • file_usage
  • node__field_image

Behavior: The database acts as a **metadata index**, not a binary store.

Embedded analogy: Flash memory holds the binary data; the database holds the directory entries and metadata.

6.5 Directory Structure for Storage Components

Relevant directories in Drupal core:

  • core/lib/Drupal/Core/Entity/ – entity definitions, storage controllers
  • core/lib/Drupal/Core/Field/ – field types, widgets, formatters
  • core/modules/system/src/Entity/ – base entity implementations
  • core/modules/file/src/ – file and image metadata handling

These directories contain the logic that:

  • defines entity types
  • defines field types
  • loads and saves field values
  • assembles entities from multiple tables

6.6 Why Drupal Chooses This Architecture

Drupal’s storage model is optimized for:

  • multilingual content
  • revision history
  • flexible content types
  • pluggable field types
  • typed data validation
  • field-level storage backends

A single-table-per-entity approach would make these features impossible or extremely complex.

Behavioral insight: Drupal prioritizes flexibility and extensibility over classical relational efficiency.

6.7 Design Implications for Your Project

Understanding this architecture helps you:

  • predict how data is stored and loaded
  • design efficient queries
  • avoid unnecessary SQL
  • structure your content types cleanly
  • understand why entity loading is expensive
  • appreciate the need for caching layers

For your archive project, this means:

  • fields are cheap to add
  • content types are flexible
  • storage is predictable
  • performance is manageable with caching

6.8 Successor-Friendly Notes

To help future maintainers:

  • Document your content types and fields clearly.
  • Avoid unnecessary custom SQL queries.
  • Use the Entity API whenever possible.
  • Keep field names meaningful and stable.
  • Explain the entity–field–storage model in your project documentation.

End of Section

7. How Drupal Stores Content, Files, and Configuration

Drupal uses a layered storage architecture that separates content, configuration, files, and metadata across different subsystems. This design can feel “hidden” at first, especially when coming from systems like DNN or MediaWiki/Cargo, where each module or template creates its own SQL tables. This section explains where Drupal actually stores data and why the database appears minimal.

7.1 Overview of Drupal’s Storage Layers

Drupal distributes its data across four main storage layers:

  1. **Content storage (SQL tables)**
  2. **File storage (disk filesystem)**
  3. **Configuration storage (YAML files)**
  4. **Metadata and cache storage (SQL or Redis)**

Each layer serves a different purpose and has different performance and flexibility characteristics.

Embedded analogy: Drupal behaves like a system where:

  • SQL = structured NVRAM
  • Filesystem = binary flash storage
  • YAML config = firmware configuration blocks
  • Cache bins = fast volatile memory

7.2 Content Storage (SQL Tables)

Content entities (nodes, media, taxonomy terms, paragraphs) store their field values in SQL.

Behavior:

  • All content types share the same base tables:
    • node
    • node_field_data
    • node_field_revision
  • Each configurable field creates its own table:
    • node__field_chapter
    • node__field_place
    • node__field_asset
    • node__field_organisation

Important: Field tables are created **only after the first content item is saved**. Until then, the database appears “empty.”

Relevant directories:

  • core/lib/Drupal/Core/Entity/
  • core/lib/Drupal/Core/Field/

7.3 File Storage (Disk Filesystem)

Binary files (images, PDFs, scans, documents) are stored on disk, not in SQL.

Default location: sites/default/files/

SQL stores only metadata:

  • file_managed – file path, size, MIME type
  • file_usage – which entities reference the file

Embedded analogy: The filesystem holds the binary payload; SQL holds the directory entries.

7.4 Configuration Storage (YAML Files)

Configuration entities store their data entirely on disk as YAML.

Examples:

  • content type definitions
  • field definitions
  • views
  • vocabularies
  • menus
  • image styles

Locations:

  • Active configuration: config/
  • Exported configuration: config/sync/

Behavior:

  • Configuration is immutable at runtime.
  • Changes require import/export.
  • YAML files act like firmware configuration blocks.

Relevant directories:

  • core/lib/Drupal/Core/Config/

7.5 Metadata and Cache Storage

Drupal stores computed data and metadata in cache bins.

Examples:

  • render cache
  • dynamic page cache
  • routing cache
  • entity cache

These may live in:

  • SQL tables (e.g., cache_render)
  • Redis (if configured)

Behavior:

  • Cache is volatile and rebuildable.
  • Cache metadata bubbles up during rendering.

7.6 Why Drupal’s Database Looks “Small”

Compared to systems like DNN or MediaWiki/Cargo, Drupal’s database appears minimal because:

  • Content types do not create their own tables.
  • All content types share the same base entity tables.
  • Fields create tables only when content exists.
  • Configuration is stored in YAML, not SQL.
  • Files are stored on disk, not in SQL.
  • Cache is stored in shared bins, not per-module tables.

Architectural insight: Drupal prioritizes flexibility, multilingual support, and field-level revisioning over classical relational normalization.

7.7 Summary of Where Data Lives

Data Type Storage Location Notes
Content field values SQL (e.g., node__field_xyz) Created only after first content item is saved.
Content metadata SQL (node_field_data) Shared across all content types.
Files (binary) Disk (sites/default/files) Actual file content.
File metadata SQL (file_managed) Path, size, MIME type.
Configuration YAML (config/sync) Content types, fields, views, vocabularies.
Cache SQL or Redis Rebuildable; not permanent.

7.8 Why This Matters for System Design

Understanding Drupal’s storage architecture helps you:

  • locate where your CHAPTER, PLACE, ORGANISATION, ASSET data lives
  • predict how content is assembled from multiple tables
  • understand why the DB looks sparse until content exists
  • know where to look on disk for files and configuration
  • design successor-friendly content models
  • avoid treating Drupal as a black box

Embedded analogy: Drupal is a hybrid system where:

  • SQL stores structured, typed data
  • Disk stores binary assets
  • YAML stores configuration
  • Cache stores computed results

This layered approach is what gives Drupal its flexibility and power.