Series Overview & ToC | Previous Article | Next Article


So far we have migrated three entity types: content types, taxonomy vocabularies, and paragraphs. It is very common that fields are attached to those, and other entities, to collect and display data. Field migrations can be tricky. For one, it is a multi-step process that requires, at a minimum, four different migrations. Additionally, it is common to find errors because field related configuration used in Drupal 7 is not available in Drupal 10. In this article, we take a pause from executing migrations to understand how fields work in Drupal. The information presented today will prove useful for custom migrations, especially when they include content model changes.

Understanding Drupal fields

Drupal fields are used to provide structure to the information the CMS stores. They save discrete data, which can be used for displaying, filtering, and sorting purposes. Fields are attached to entities like nodes, users, taxonomy terms, blocks, etc. For entities that can have bundles, each bundle can have a different set of fields attached to them. The node entity, for example, almost always has a different set of fields attached to each content type. Read this article for more information on fields from a site building perspective.

Fields can be of multiple types: number, text, file, date, telephone, reference to other Drupal entities, and more. The field type determines the underlying database table structure used to store the field data. For example, Drupal offers Date and Timestamp fields to store date and time information. The Date field uses a string representation in ISO 8601 format while the Timestamp field stores an integer corresponding to the UNIX timestamp. Each field can be of only one type. For example, a field can be of type Date or Timestamp, but not both.

Under the hood, a field type consists of two components: storage settings and instance settings. To provide some context, let's review the settings of a Date field attached to the Event content type. Below is an screenshot of the configuration form:

Configuration form

In the screenshot there are two storage settings in the Field Storage section: Date type and Allowed number of values. The former determines if we want to save Date and time values or Date only values, without a time component. The latter is internally named field cardinality and indicates how many values to store for this field. The cardinality can be a fixed integer value or unlimited if we want to allow an arbitrary number of values. In our example, we only deal with single date events so a cardinality of one suffices. If we were to allow for multi-day events, increasing the cardinality of this field is not the best approach. Instead, use a Date range field which allows specifying start and end dates as part of a single value. Back to field storage settings, some apply globally for every field type while others are dependent on the field type in question. For example, cardinality is a global field storage setting that can be set independently on the field type. On the other hand, the date type storage setting is only available for fields of type Date and Date range fields.

The rest of the settings in the screenshot above are instance settings: Label, Help text, Required field, Set default value, and Default value. Similar to storage settings, instance settings can be global or specific to field types. All the settings in this Date field example are global because the field type itself does not provide any field-specific setting. Other field types do. For example, the Image field provides settings to restrict the allowed file extensions, specify maximum and minimum image dimensions, enforce a maximum upload size, enable and optionally require title and alt HTML attributes.

Both storage and instance settings provide validation. On top of this, each field type has some intrinsic validation logic. For example, a required Date field not only checks that a value is specified, but also that such value is a valid date. February 31, 2024 would yield a validation error if the user tries to enter that date. Drupal APIs enforce data validation at different layers. Explaining them is beyond the scope of this article, but for reference you can research about Field API, Entity API, Form API, and Typed Data API, and validation constraints.

Drupal 10 allows sharing fields across bundles of the same entity. Storage settings apply to all usages of the field across bundles of the same entity. Instance settings can be different for each bundle where the field is used. Consider an Image field attached to multiple content types. One storage setting for images is Upload destination which determines if files uploaded via this field are public or private. You cannot have the same field, identified by the same machine name, as public in one content type and as private in another. In fact, if no content has been added for that field, you will be allowed to change the field storage value in any of the content types. But remember that the change will affect all the content types where the field is used. If there is already data for the field, Drupal might lock some storage settings to maintain data integrity and prevent data loss. For example, if an option of a List (text) field has already been used, Drupal will not let you remove that option nor change its machine name.

On top of field types, Drupal also leverages field widgets for capturing data and field formatters for displaying data. The field widgets are used in the edit interface for an entity while the field formatters are used in the view interface for an entity. Going back to content types and nodes, field widgets are configured as part of the Manage form display section of the content type. You see them in action when editing an entity. As for field formatters, they are configured as part of the Manage display section of the content type, and you see them in action when viewing an entity. The available widgets and formatters will vary per field type. For instance, out of the box a List (text) field can choose between two field widgets: a select list or check boxes/radio buttons. The same field type can choose between two field formatters: key to display the machine name of the option or default to show the name/label of the option.

As it is often the case, Drupal can be extended via modules. New modules can add new field types, alter storage or instance settings, provide extra widgets and formatters for existing fields, and more. Remember these are provided by modules and such modules need to be available in order for the functionality to be available. Drupal core ships with the Telephone and Datetime range modules, but they are disabled by default. Unless you enable them, field related functionality provided by them will not be available.

Field-related migrations

We could probably write a whole series on Drupal fields alone. For now, let's shift our focus back to migrations. You can certainly go a long way without understanding how Drupal fields work. If your Drupal 7 project only uses core fields or contributed fields with an automated upgrade path, the migrate API will take care of everything for you. But that is not the case with most real life projects. Investing time learning about the different aspects of fields is very valuable, especially in custom migrations. If you need to make content model changes — like changing from one entity to another, changing from one field type to another, or combining/splitting fields — it will be necessary to understand how fields work.

As we begin, it’s important to note that migrating fields requires at least four migrations:

  • upgrade_d7_field migrates field storage settings.
  • upgrade_d7_field_instance migrates field instance settings.
  • upgrade_d7_field_instance_widget_settings migrates field widget settings.
  • upgrade_d7_field_formatter_settings migrates field formatter settings.

Each of those migrations can specify one or more migration dependencies. For example, it is common for the upgrade_d7_field_instance migration to depend on the upgrade_d7_node_type, because field instances are associated with bundles of the node entity, that is content types. upgrade_d7_field_instance also requires the field storage settings to be present. For this reason, it declares a dependency on upgrade_d7_field. Both upgrade_d7_field_instance_widget_settings and upgrade_d7_field_formatter_settings depend on field instance settings thus they declare a dependency on upgrade_d7_field_instance. In the case of upgrade_d7_field_formatter_settings, formatter settings are tied to view modes so there is an extra dependency on the upgrade_d7_view_modes migration. Other dependencies might be specified.

Field migrations are very dynamic and depend on the entities that are used on your project. If using paragraphs, the field migrations will likely depend on the migration that created the paragraph types, upgrade_d7_field_collection_type in our example. We are going to cover which migration needs to be executed and in which order as we progress through the examples.

Other modules can introduce field related configuration that also needs to be migrated. For example, if using the field group module in Drupal 7, you can have extra migrations for each entity/bundle combination that uses field groups. Also remember that the automated upgrade path requires modules to be enabled both in Drupal 7 and Drupal 10. This also applies for modules that provide field related functionality. In fact, some upgrade paths might require two or more modules in Drupal 10 to properly migrate Drupal 7 data. For instance, the date module in Drupal 7 provides field types that can store both start and end dates. In Drupal 10, you have the Datetime module if you only need to store the start date and the Datetime range module if you need to store start and end dates. If you have a Drupal 7 field that accepts end dates, but the Datetime Range is not enabled in Drupal 10, you will get an error indicating that such a field cannot be migrated unless the extra module is enabled. These errors are logged to the messages tables. They can be accessed using the drush migrate:messages command or the migration messages admin interface.

Custom field migrations

A custom field migration is necessary when there are content model changes or when your Drupal 10 implementation uses a different field configuration than the one used in Drupal 7. Consider the following examples:

  • You want to convert embedded images in rich text fields in Drupal 7 into media entities in Drupal 10.
  • You want to extract content from rich text fields in Drupal 7 and save it into paragraphs or layout builder in Drupal 10.
  • You have multiple single value link fields in Drupal 7 that should be consolidated into one social link field in Drupal 10.
  • You have multiple single value file fields in Drupal 7 that should be consolidated into one multi-value file field in Drupal 10.
  • A field type, widget, or formatter used in Drupal 7 is not available in Drupal 10. A new configuration is implemented and you need to map Drupal 7 settings into Drupal 10 settings.
  • You want to convert nodes in Drupal 7 to users in Drupal 10. An email field attached to the node in D7 should be mapped to the mail and init properties of the user entity in D10.
  • You need to provide an upgrade path for a module that does not offer it out of the box. For example, at the time of publishing this article the Radioactivity module does not provide an upgrade path from Drupal 7 to Drupal 10.
  • You have a date field using the ISO 8601 representation to store the value in Drupal 7 that should be converted to a UNIX timestamp field in Drupal 10. One reason for such change could be to improve performance when doing date and time calculations or comparisons.

Broadly speaking, these transformations can be grouped into converting from one field/entity type to another, combining/splitting fields, and extracting data to save it in a structured way. There are two primary ways of performing these custom field migrations.

The first approach consists of implementing a MigrateField plugin to provide the mapping between Drupal 7 and Drupal 10 settings. This can dynamically alter the four field-related migrations we listed in the previous section. It can also alter the process plugin chain applied to individual fields based on their types.

The second approach involves manually performing the necessary data transformation as part of the process pipeline. This often means creating a custom process plugin to transform data extracted from Drupal 7 into the format the Drupal 10 field expects. For this approach, it is also possible to implement hook_migrate_prepare_row to transform data as it is fetched by the source plugin.

Database table structure for Drupal fields

A recurrent theme in today's article is data transformation. Now, we will take a look at how Drupal stores field data. Along the way, we will highlight some technical details on how fields are implemented.

Let's start with things that are common between Drupal 7 and Drupal 10 fields. Each field has a unique machine name. When a field is attached to an entity bundle, it creates two tables in the database. One table stores the current value for the latest revision of the content entity. The other table stores the past values used by previous revisions of the content entity. In both cases, the table names include the machine name of the field they store data for. In both cases, there is a column in the table to store delta values. Every field can potentially have an unlimited cardinality. When a field allows for multiple values to be stored, this delta column keeps track of the order in which values were added to the field.

In Drupal 7, fields can be shared across multiple entities. The pattern for the database tables' names is field_data_[FIELD_MACHINE_NAME] and field_revision_[FIELD_MACHINE_NAME]. In both cases, the following columns are available: entity_type, bundle, deleted, entity_id, revision_id, language, delta. Notice that language information is also stored at the field level. More columns will be created based on what the field type deems necessary to store. Drupal 7 fields are exposed by hook_field_info and their database schema is determined by hook_field_schema. As an example, consider the image field that ships with Drupal 7 core. In image_field_schema we can see that this field type requires five columns to store its data: fid, alt, title, width, and height. When the Field API creates this field, the column names follow the pattern [FIELD_MACHINE_NAME]_[COLUMN_NAME]. Assuming you are using MySQL, MariaDB, or any of its variants you can use the EXPLAIN statement to see the structure of a table. Below is the output of running EXPLAIN field_data_field_image; on the SQL server of the example Drupal 7 site.

+--------------------+------------------+------+-----+---------+-------+
| Field              | Type             | Null | Key | Default | Extra |
+--------------------+------------------+------+-----+---------+-------+
| entity_type        | varchar(128)     | NO   | PRI |         |       |
| bundle             | varchar(128)     | NO   | MUL |         |       |
| deleted            | tinyint(4)       | NO   | PRI | 0       |       |
| entity_id          | int(10) unsigned | NO   | PRI | NULL    |       |
| revision_id        | int(10) unsigned | YES  | MUL | NULL    |       |
| language           | varchar(32)      | NO   | PRI |         |       |
| delta              | int(10) unsigned | NO   | PRI | NULL    |       |
| field_image_fid    | int(10) unsigned | YES  | MUL | NULL    |       |
| field_image_alt    | varchar(512)     | YES  |     | NULL    |       |
| field_image_title  | varchar(1024)    | YES  |     | NULL    |       |
| field_image_width  | int(10) unsigned | YES  |     | NULL    |       |
| field_image_height | int(10) unsigned | YES  |     | NULL    |       |
+--------------------+------------------+------+-----+---------+-------+

Note: In the example project, you can execute ddev mysql to access a database client to execute SQL statements. Alternatively, you can use one of the many database GUIs available for DDEV.

In Drupal 10, fields can only be shared across bundles of the same entity. The pattern for the database tables' names is [ENTITY_TYPE]__[FIELD_MACHINE_NAME] and [ENTITY_TYPE]_revision__[FIELD_MACHINE_NAME]. Notice the double underscores before the machine name of the field. In both cases, the following columns are available: entity_type, bundle, entity_id, revision_id, language, delta. Compared to Drupal 7, only the deleted column is missing. Like in Drupal 7, more columns will be created based on what the field type deems necessary to store. In Drupal 10, field types are exposed by FieldType plugins and their database schema is determined by the schema method of the class providing the field type. Study the image field that ships with Drupal 10 core. It is provided by the ImageItem class, which in its schema method lists some familiar column names: target_id, alt, title, width, height. Compared to Drupal 7, the fid column was renamed to target_id. Using target_id as the column to store the ID of the entity the field is pointing to is a common pattern in all entity reference fields. As with Drupal 7, these column names translate to [FIELD_MACHINE_NAME]_[COLUMN_NAME] columns when the field is created by the Field API. For reference, below is the output of running EXPLAIN node__field_image; on the SQL server of the example Drupal 10 site.

+-----------------------+------------------+------+-----+---------+-------+
| Field                 | Type             | Null | Key | Default | Extra |
+-----------------------+------------------+------+-----+---------+-------+
| bundle                | varchar(128)     | NO   | MUL |         |       |
| deleted               | tinyint(4)       | NO   | PRI | 0       |       |
| entity_id             | int(10) unsigned | NO   | PRI | NULL    |       |
| revision_id           | int(10) unsigned | NO   | MUL | NULL    |       |
| langcode              | varchar(32)      | NO   | PRI |         |       |
| delta                 | int(10) unsigned | NO   | PRI | NULL    |       |
| field_image_target_id | int(10) unsigned | NO   | MUL | NULL    |       |
| field_image_alt       | varchar(512)     | YES  |     | NULL    |       |
| field_image_title     | varchar(1024)    | YES  |     | NULL    |       |
| field_image_width     | int(10) unsigned | YES  |     | NULL    |       |
| field_image_height    | int(10) unsigned | YES  |     | NULL    |       |
+-----------------------+------------------+------+-----+---------+-------+

Technical note: The columns listed in the schema method of the class that implements the field type correspond to the sub-fields covered in article 13. The default sub-field is the column specified in the mainPropertyName method of the same class. In the case of the ImageItem class, the mainPropertyName method returns target_id making it the default sub-field. For some field types, like double_field where there is no clear primary subfield, the mainPropertyName method will return NULL.

To put this into practice, consider the following field mapping for an image field:

process:
 field_image:
   -
     plugin: sub_process
     source: field_image
     process:
       target_id: fid
       alt: alt
       title: title
       width: width
       height: height

Technical note: Image is a core field both in Drupal 7 and Drupal 10. In this case, the transformation is provided by Drupal 10 core via the defineValueProcessPipeline method of the ImageField migrate field plugin.

Because all fields can potentially have multiple values, when the migrate API fetches field data from Drupal 7, it will return a nested array structure. At the top level the nested array is keyed by the deltas and each value includes a full set of columns used by the field type. See the following example of a content entity with two values for the field_image field:


[
 0 => [
   fid = "3"
   alt = "DrupalCon Portland 2024 logo"
   title = null
   width = "800"
   height = "400"
 ],
 1 => [
   fid = "7"
   alt = "DrupalCon Barcelona 2024 logo"
   title = null
   width = "800"
   height = "400"
 ],
]


When we have nested structures like the above, we can use the sub_process process plugin to iterate over them. In the process configuration of this plugin we specify the mapping from Drupal 7 column names to Drupal 10 column names. Notably, the fid column from Drupal 7 is mapped to target_id in Drupal 10. The rest of the columns are mapped verbatim without the need of name changes.

In the case of image fields, Drupal takes care of implementing the necessary transformations. But the same logic applies for custom field migrations. We read data from Drupal 7 and transform it, one way or another, into what Drupal 10 expects. We will cover different examples as progress through the series.

Drupal 10 field and migrate plugins

To wrap up today's article, I want to highlight that in Drupal 10, field types, widgets, and formatters are all implemented via plugins. This will become evident when we execute the field-related migrations. Very likely you will get errors mentioning that a plugin was not found for a field type, widget, or formatter used in Drupal 7 that is not available in Drupal 10. Based on the enabled modules in your Drupal 10 installation, you can check which field plugins are available using the following Drush commands:



# List of field type plugin ids.
ddev drush php:eval "print_r(array_keys(\Drupal::service('plugin.manager.field.field_type')->getDefinitions()));"

# Details on a specific field type plugin id.
ddev drush php:eval "print_r(\Drupal::service('plugin.manager.field.field_type')->getDefinitions()['FIELD_TYPE_ID']);"

# List of field widget plugin ids.
ddev drush php:eval "print_r(array_keys(\Drupal::service('plugin.manager.field.widget')->getDefinitions()));"

# Details on a specific field widget plugin id.
ddev drush php:eval "print_r(\Drupal::service('plugin.manager.field.widget')->getDefinitions()['FIELD_WIDGET_ID']);"

# List of field formatter plugin ids.
ddev drush php:eval "print_r(array_keys(\Drupal::service('plugin.manager.field.formatter')->getDefinitions()));"

# Details on a specific field formatter plugin id.
ddev drush php:eval "print_r(\Drupal::service('plugin.manager.field.formatter')->getDefinitions()['FIELD_FORMATTER_ID']);"

Migrations, source, process, destination, and field plugins can be inspected in a similar manner:



# List of migration plugin ids.
ddev drush php:eval "print_r(array_keys(\Drupal::service('plugin.manager.migration')->getDefinitions()));"

# Details on a specific migration plugin id.
ddev drush php:eval "print_r(\Drupal::service('plugin.manager.migration')->getDefinitions()['MIGRATION_ID']);"

# List of migrate source plugin ids.
ddev drush php:eval "print_r(array_keys(\Drupal::service('plugin.manager.migrate.source')->getDefinitions()));"

# Details on a specific migrate source plugin id.
ddev drush php:eval "print_r(\Drupal::service('plugin.manager.migrate.source')->getDefinitions()['SOURCE_PLUGIN_ID']);"

# List of migrate process plugin ids.
ddev drush php:eval "print_r(array_keys(\Drupal::service('plugin.manager.migrate.process')->getDefinitions()));"

# Details on a specific migrate process plugin id.
ddev drush php:eval "print_r(\Drupal::service('plugin.manager.migrate.process')->getDefinitions()['PROCESS_PLUGIN_ID']);"

# List of migrate destination plugin ids.
ddev drush php:eval "print_r(array_keys(\Drupal::service('plugin.manager.migrate.destination')->getDefinitions()));"

# Details on a specific migrate destination plugin id.
ddev drush php:eval "print_r(\Drupal::service('plugin.manager.migrate.destination')->getDefinitions()['DESTINATION_PLUGIN_ID']);"

# List of migrate field plugin ids.
ddev drush php:eval "print_r(array_keys(\Drupal::service('plugin.manager.migrate.field')->getDefinitions()));"

# Details on a specific migrate field plugin id.
ddev drush php:eval "print_r(\Drupal::service('plugin.manager.migrate.field')->getDefinitions()['MIGRATE_FIELD_PLUGIN_ID']);"

Technical note: There are more migration plugin types. Drupal core provides migrate ID maps. The Migrate Plus module adds authentication, data_fetcher, and data_parser plugins. The Migrate Tools module includes shared configuration plugins. Other modules can extend the migration system with more migration-related plugin types. The ones listed in the snippet above are the most frequently used when writing custom migrations.

We covered a lot of ground today. This article is arguably the most technical in the series so far. It is intended to act as a reference for upcoming articles covering configuration and content migrations. While we are going to include many examples of custom field migrations, the information presented today should help inform your real-world migrations.


Image by Jürgen from Pixabay