Series Overview & ToC | Previous Article | Next Article - coming soon!


In the previous article, we learned how to migrate paragraphs and create custom process plugins. Good exercise for the brain. Today, we will do some exercises for the body. Get ready for a strength training session — Drupal style — where we will learn about creating custom source plugins, extending existing ones, and writing media migrations from scratch.

Grab a bottle of water and a towel. Let's go!

Entity ID and high water mark considerations

To warm up, we will do some shoulder rolls. While doing so let's revisit entity IDs and high water mark considerations. We covered file migrations back in article 24. Much of what was presented there applies today. The Tag1 Team Talks podcast on file and media migrations also contains lots of useful information for today's topic. To avoid repeating ourselves, we’ll summarize what you need to take into account to prevent entity ID conflicts and choose a high water mark.

You need to know which modules were used in the source site to provide media related functionality. Our example project uses Core's file and image modules plus the YouTube Field module to store references to external videos. Other Drupal 7 projects might use the File Entity module, which makes the file entity fieldable and allows it to have bundles similar to Drupal 10 media entities. Still other projects might use the D7 Media module and its vast ecosystem of related modules to provide a rich media management experience.

The reason we bring this up is because depending on how media was implemented in Drupal 7 and the content model you want to pursue in Drupal 10, you will have to look at different tables to determine the new AUTO_INCREMENT value for the media entity. Our Drupal 7 example project is relatively simple so checking the file_managed table will suffice. We showed how to do this in article 24.

For brevity, we are going to show how to configure the auto_increment_alter_content_entities setting of the AUTO_INCREMENT Alter module to apply the new AUTO_INCREMENT value. Refer to article 23 for more information of how this module works or how to perform the operation in custom code.

$settings['auto_increment_alter_content_entities'] = [
 'media' => [350], // Alter the tables for the media content entity.
];

Now execute the command provided by the AUTO_INCREMENT Alter module to trigger the alter operation in the Drupal 10 project.

ddev drush auto-increment-alter:content-entities

As for the high water property, it will depend on where your data comes from. Today's example includes two different sources:

  • Field API tables. As we are going to discuss later, Drupal 7 fields use two tables: one for current revision data and another for past revision data. In theory, the revision_id column could be used as the high water mark. That said, the table for the current revision allows NULL values for the revision_id column, making it less than ideal for our purposes. Many times there will be a value, but its presence is not enforced at the database level. For simplicity, we will not define a high water mark in the migrations that read data directly from field API tables.
  • File entity tables. For this, we can use the same high water mark configuration used in the upgrade_d7_file and upgrade_d7_file_private migrations:
source:
 key: migrate
 plugin: tag1_media_image
 high_water_property:
   name: fid
   alias: f

Migrate source plugin

Take some lightweight dumbbells. I’m using 10 pounds, approximately 4.5 kilos. To start, we will do three rounds of bicep curls. In between each round, we will casually talk about migrate source plugins.

Migrate source plugins are responsible for fetching data from one of many supported data repositories. In our example project, all migrations retrieve data from a Drupal 7 database. Other supported sources are: JSON and XML files, CSV files, Excel and LibreOffice files, etc. From a technical point of view, they leverage Drupal's Plugins API.

Below are some highlights regarding their implementation:

  • Migrate source plugins are classes in the Drupal\[module]\Plugin\migrate\source namespace that implement the MigrateSourceInterface interface.
  • The SourcePluginBase base class is available for convenience. It implements common methods invoked for source plugins.
  • Many source plugins implement a prepareRow method that can add, edit, or delete data retrieved from the source. It is possible to use the retrieved data to further query the source and fetch extra information. For example, fieldable entities use this method to attach Drupal 7 Field API data to the entity being fetched. The prepareRow method can also be used to instruct that a record should not be processed by returning FALSE.
  • If you need to inject a service, the plugin needs to implement the ContainerFactoryPluginInterface interface and its create method. Examples of this are the SqlBase, d7_node and spreadsheet. Notice that while d7_node does not explicitly implement the interface in its class declaration, it still can implement/overwrite the methods of the interface defined in one of its parent classes.
  • For discovery, they still use Doctrine annotations at the time of publishing this article. When this Drupal core issue is committed, they will use PHP attributes for discovery of migrate source plugins.
  • Many source plugins related to Drupal 7 to 10 migrations extend the SqlBase class and implement the query method. This lets you alter the SQL query used to retrieve data from the source database. You can add conditions, join tables, expand the list of fields to fetch, and much more. A common use case is limiting the number of records to retrieve. For example, you can decide only published nodes will be migrated to Drupal 10, leaving any unpublished content behind. This will be presented in the next article.
  • The Migrate Drupal module offers a lot of functionality related to migrating data from Drupal 6 and 7. It even provides a way to introspect the current Drupal 10 installation to fetch content and configuration data via the content_entity and config source plugins respectively.
  • The plugin.manager.migrate.source service is the plugin manager for migrate source plugins. You can use it to get a list of available source plugins and obtain more details about their definition.
# List of migrate source plugin ids.
ddev drush php:eval "print_r(array_keys(\Drupal::service('plugin.manager.migrate.source')->getDefinitions()));"

# Details on a specific migrate source plugin id.
ddev drush php:eval "print_r(\Drupal::service('plugin.manager.migrate.source')->getDefinitions()['SOURCE_PLUGIN_ID']);"

Creating custom source plugins from scratch

Put those dumbbells aside for a moment, because we are going to do bodyweight exercise. Let's do three rounds of push-ups, 45 seconds each. Go at your own pace and feel free to kneel if necessary. When done, come back to learn how to create custom source plugins from scratch.

Per our migration plan, we want to migrate data stored in YouTube fields in Drupal 7 as remote video media entities in Drupal 10. In practice, there is only one field that would undergo this transformation: field_video_recording. The YouTube field module is available for Drupal 10 and provides an automated upgrade path via a field migration plugin. That said, reading field data to create media entities is a great example to learn how to create a custom source plugin and incorporate content model changes as part of the upgrade process.

Back in article 16, we had a deep dive into understanding how Drupal fields work, both from the perspective of PHP code and database tables. I also recommend watching this presentation that covers multiple examples of how to perform content model changes when migrating from Drupal 7.

In Drupal 7, YouTube fields have 2 sub-fields: input and video_id. Input stores the URL as submitted by the user. Example values are:

  • https://www.youtube.com/watch?v=HMYpxm-2o4c
  • https://youtu.be/HMYpxm-2o4c

Notice that both URLs link to the same video, but follow a different URL pattern. The video_id sub-field calls the youtube_get_video_id function to extract the video ID out of multiple accepted URL formats as defined in the module's README.txt file. For these two video URLs, the video_id value would be the same: HMYpxm-2o4c. When migrating data into Drupal 10 we will deduplicate records like these. Namely, we will only migrate unique video ID values.

In Drupal 10, a remote video media entity uses a plain text field to store the URL to the video: field_media_oembed_video. Plain text fields have a single value sub-field. Our custom source plugin will fetch the video ID and create a valid YouTube URL that can be assigned to the text field in the remote video media entity.

Create a PHP file named Tag1YouTubeField.php inside our tag1_migration custom module's /src/Plugin/migrate/source folder. The path relative to the Drupal 10's project docroot is web/modules/custom/tag1_migration/src/Plugin/migrate/source/Tag1YouTubeField.php. The content of the file should be:

<?php

namespace Drupal\tag1_migration\Plugin\migrate\source;

use Drupal\Component\Plugin\Exception\InvalidPluginDefinitionException;
use Drupal\Core\State\StateInterface;
use Drupal\migrate\Plugin\migrate\source\SqlBase;
use Drupal\migrate\Plugin\MigrationInterface;
use Drupal\migrate\Row;

/**
 * Drupal 7 YouTube Field data.
 *
 * Available configuration keys:
 * - name: The name of the YouTube field.
 * - revisions: (optional) If TRUE, retrieve field revisions. Defaults to FALSE.
 *
 * For additional configuration keys, refer to the parent classes.
 *
 * @see https://udrupal.com/migrate-source-plugins
 *
 * Example:
 *
 * @code
 *   source:
 *     plugin: tag1_youtube_field
 *     name: field_video
 *     revisions: TRUE
 * @endcode
 *
 * @see \Drupal\migrate\Plugin\migrate\source\SqlBase
 * @see \Drupal\migrate_drupal\Plugin\migrate\source\DrupalSqlBase
 * @see \Drupal\migrate_plus\Plugin\migrate\source\Table
 * @see \Drupal\migrate_source_csv\Plugin\migrate\source\CSV
 * @see \Drupal\media_migration\Plugin\migrate\source\VideoEmbedField
 *
 * @MigrateSource(
 *   id = "tag1_youtube_field",
 *   source_module = "youtube"
 * )
 */
class Tag1YouTubeField extends SqlBase {

  /**
   * YouTube field table.
   *
   * @var string
   */
  protected string $tableName;

  /**
   * Column name storing the YouTube video ID.
   *
   * @var string
   */
  protected string $videoIdColumnName;

  /**
   * YouTube URL prefix.
   *
   * @var string
   */
  protected string $urlPrefix = 'https://www.youtube.com/watch?v=';

  /**
   * {@inheritdoc}
   */
  public function __construct(array $configuration, $plugin_id, $plugin_definition, MigrationInterface $migration, StateInterface $state) {
    if (empty($configuration['name'])) {
      throw new \InvalidArgumentException("Table tag1_youtube is missing 'name' property configuration.");
    }

    parent::__construct($configuration, $plugin_id, $plugin_definition, $migration, $state);

    // @see \Drupal\migrate_drupal\Plugin\migrate\source\d7\FieldableEntity::getFieldValues()
    $this->tableName = ((bool) $configuration['revisions'] === TRUE ? 'field_revision_' : 'field_data_') . $configuration['name'];

    // Retrieve the 'video_id' sub-field.
    // @see youtube_field_schema() in Drupal 7's youtube.install file.
    // @see youtube_get_video_id() in Drupal 7's youtube.inc file.
    $this->videoIdColumnName = $configuration['name'] .  '_video_id';

    if (!$this->getDatabase()->schema()->tableExists($this->tableName)) {
      throw new InvalidPluginDefinitionException($plugin_id, "Source database table '{$this->tableName}' does not exist.");
    }
  }

  /**
   * {@inheritdoc}
   */
  public function query() {
    $query = $this->select($this->tableName, 'yt')
      ->distinct();
    $query->addField('yt', $this->videoIdColumnName, 'video_id');

    return $query;
  }

  /**
   * {@inheritdoc}
   */
  public function prepareRow(Row $row) {
    $video_id = $row->getSourceProperty('video_id');
    $row->setSourceProperty('video_url', $this->urlPrefix . $video_id);
  }

  /**
   * {@inheritdoc}
   */
  public function fields() {
    return [
      'video_id' => $this->t('YouTube video id.'),
      'video_url' => $this->t('YouTube video URL.'),
    ];
  }

  /**
   * {@inheritdoc}
   */
  public function getIds() {
    $ids['video_id']['type'] = 'string';
    $ids['video_id']['alias'] = 'yt';
    return $ids;
  }

}


Technical note: When the Core's Migrate Drupal module is enabled, it reads the value of source_module in the annotation and checks if the listed module is enabled in the source site. Because our custom source plugin reads data from the YouTube field module, we use its machine name youtube. If the module specified in source_module is not enabled, any migration using that source plugin will be filtered out and will not appear in the list of available migrations that can be executed. This is done with a combination of the enforce_source_module_tags configuration in migrate_drupal.settings.yml and tags being added to the migration_tags section of a migration definition file. By default, Drupal 7 is one of the migration tags to enforce and it is included in the generated migrations.

A detailed review of the PHP code for the custom source plugin will be left as an exercise to the curious reader. What we want to highlight is that knowing the table structure of field API tables is key to accomplish our goal. First, we need to determine if we want to retrieve data from the current revision only or include all past revisions data. In Drupal 7, every field created two tables based on its machine name: field_data_FIELD_NAME and field_revision_FIELD_NAME.

Our process plugin exposes two settings:

  • name: Required. Stores a string indicating the name of the YouTube field in Drupal 7 to fetch data from. In our example that would be: field_video_recording.
  • revisions: Optional. A boolean indicating whether revision data should be retrieved or not. Defaults to FALSE. In our example, we will not migrate revisions data.

With this information, the source plugin can figure out which Drupal 7 table to query: field_data_field_video_recording. But what about its structure? We need to find out the column name that stores the video ID data. From the root of the Drupal 7 project execute ddev mysql to access a command-line interface client MySQL. Then execute the following statement at the SQL prompt DESCRIBE field_data_field_video_recording;.

You will get an output similar to the following:

+--------------------------------+------------------+------+-----+---------+-------+
| Field                          | Type             | Null | Key | Default | Extra |
+--------------------------------+------------------+------+-----+---------+-------+
| entity_type                    | varchar(128)     | NO   | PRI |         |       |
| bundle                         | varchar(128)     | NO   | MUL |         |       |
| deleted                        | tinyint(4)       | NO   | PRI | 0       |       |
| entity_id                      | int(10) unsigned | NO   | PRI | NULL    |       |
| revision_id                    | int(10) unsigned | YES  | MUL | NULL    |       |
| language                       | varchar(32)      | NO   | PRI |         |       |
| delta                          | int(10) unsigned | NO   | PRI | NULL    |       |
| field_video_recording_input    | varchar(1024)    | YES  |     | NULL    |       |
| field_video_recording_video_id | varchar(15)      | YES  | MUL | NULL    |       |
+--------------------------------+------------------+------+-----+---------+-------+

Drupal 7 field API creates tables with a common structure. The following columns are present for all fields no matter their type: entity_type, bundle, deleted, entity_id, revision_id, language, and delta. Then, there will be one column for each sub-field as defined by the field type. In this case, we have field_video_recording_input and field_video_recording_video_id. Notice that the machine name of the field is prepended to the sub-field name with an underscore in between. The columns are the same for the field_data_FIELD_NAME and field_revision_FIELD_NAME tables. They differ in which columns are nullable and which act as primary keys.

In its query method, our custom process plugin builds a query to retrieve unique video ID values from current revision data of the field_video_recording field:

SELECT DISTINCT field_video_recording_video_id FROM field_data_field_video_recording;

Then, in its prepareRow method, the custom source plugin reads the retrieved video_id value and creates a new source property named video_url with a valid YouTube URL. This can later be used in our migration to populate the field_media_oembed_video field for the remote video media entity.

Speaking of which, let's write that migration.

From Drupal 7 YouTube fields to Drupal 10 remote video media entities

Remember to stay hydrated during today's routine. Next, let's pick up those dumbbells for three rounds of triceps kickbacks. When completed, we will learn how to write a migration to create remote media entities from scratch.

Before doing so, it is important to understand what entities will be created, their base field definitions, and the fields attached to each bundle. We covered this in great detail in article 26. Additionally, this article has a reference of base field definitions for Drupal 10 media entities. In short, we will create entities of type media of the remote_video bundle and populate its field_media_oembed_video plain text field with the video_url column retrieved by our custom source plugin.

Create an upgrade_d7_media_remote_video.yml file in the web/modules/custom/tag1_migration/migrations folder of our Drupal 10 project. We will use that same file name, without the file extension, as the migration ID. You can come up with any name as long as it is unique. Below is the content of the file:

id: upgrade_d7_media_remote_video
class: Drupal\migrate\Plugin\Migration
migration_tags:
 - 'Drupal 7'
 - Content
 - media
 - tag1_content
label: 'Media (Remote video)'
source:
 key: migrate
 plugin: tag1_youtube_field
 name: field_video_recording
 revisions: FALSE
process:
 field_media_oembed_video: video_url
 status:
   plugin: default_value
   default_value: 1
 uid:
   plugin: default_value
   default_value: 1
 langcode:
   plugin: default_value
   default_value: en
destination:
 plugin: 'entity:media'
 default_bundle: remote_video
migration_dependencies:
 required: {  }
 optional: {  }

Now, rebuild caches for our new migration to be detected and trigger an import operation.

ddev drush cache:rebuild
ddev drush migrate:status upgrade_d7_media_remote_video
ddev drush migrate:import upgrade_d7_media_remote_video

Note: The first time you execute the upgrade_d7_media_remote_video migration, you might perceive that it runs slower than other migrations we have executed recently. Out of the box, when Drupal creates media entities, it tries to create thumbnails that serve as a preview when displaying media entities in listing pages. The thumbnail generation could be a slow process. Fortunately, you can configure media types to queue the thumbnail generation at a later time, speeding up the execution of the migration. This is configured on a per bundle basis. To enable this feature for remote videos go to https://migration-drupal10.ddev.site/admin/structure/media/manage/remote_video and enable the Queue thumbnail downloads option inside the Publishing options section. Thumbnail generation will now happen via a queue worker on cron execution.

If things are properly configured, you should not get any errors. Go to https://migration-drupal10.ddev.site/admin/content/media?type=remote_video and you shall be presented with multiple videos from our Tag1 Team Talks podcast series.

Note that in our upgrade_d7_media_remote_video migration, we are calling the tag1_youtube_field. This should match the id of the MigrateSource PHP annotation used in our custom source plugin. Then, we set the name configuration to the Drupal 7 field name to fetch data from field_video_recording. Finally, we set the revisions configuration to FALSE — meaning we only want to migrate data from current revisions. We could have opted for not including the revisions configuration (because the plugin assumes a FALSE default), but we decided to include it for completeness.

As it stands, our custom source plugin can read from one Drupal 7 field at a time. It is possible to write more advanced queries to be able to fetch data from multiple fields in one go. The Media Migration module provides many source plugins which advanced queries and logic in their prepareRow implementations. For reference, take a look at the YoutubeFieldSource and VideoEmbedField source plugins which retrieve data from the YouTube field and Video Embed Field Drupal 7 modules respectively. These source plugins build dynamic queries that leverage UNION clauses to fetch data from the multiple tables at the time.

The rest of our migration is relatively straightforward. We use the entity:media destination plugin to create media entities of type remote_video. In the process section, we assign the video_url fetched by our source plugin to the field_media_oembed_video plain text field. The remaining assignments in the process section are providing default values for the status, uid, and langcode base field definitions.

Extending existing source plugins

Catch your breath, and let's go down to the mat again. Use your dumbbells for three rounds of chest presses. When finished, I will teach how to extend existing source plugins.

Remember that per our migration plan, we also want to replace some of the image fields attached to content types for media reference fields. To accomplish this, we need to migrate file entities from Drupal 7 into media entities in Drupal 10. As covered in article 24, image fields in Drupal 7 are an extension of file fields. The data for both is stored in the file_managed table. In a stock Drupal 7 installation, an example record for this table would be:

*************************** 1. row ***************************
    fid: 1
    uid: 1
filename: druplicon.jpg
    uri: public://article/druplicon.jpg
filemime: image/jpeg
filesize: 2890
 status: 1
timestamp: 280299600

Technical note: If you are using the 7.x-2.x branch of the File Entity module in Drupal 7, it alters the file_managed table to add an extra column: type. For the record above, the value for the type column would be image, which is one of the default file types provided by the module. The module also uses a queue triggered on cron to update the value of the new column based on the MIME type of the file.

We already used the d7_file plugin to migrate public and private files in article 24. Today, we are going to extend this core plugin and alter its query to only retrieve images. Namely, we will use the filemime column in the file_managed table to limit the records to retrieve. If our project used the File Entity module, we could use the extra type column for filtering purposes.

Create a PHP file named Tag1MediaImage.php inside our tag1_migration custom module's /src/Plugin/migrate/source folder. The path relative to the Drupal 10's project docroot is web/modules/custom/tag1_migration/src/Plugin/migrate/source/Tag1MediaImage.php. The content of the file should be:

<?php

namespace Drupal\tag1_migration\Plugin\migrate\source;

use Drupal\file\Plugin\migrate\source\d7\File;

/**
 * Retrieve permanent images
 *
 * @see \Drupal\file\Plugin\migrate\source\d7\File
 *
 * @MigrateSource(
 *   id = "tag1_media_image",
 *   source_module = "file"
 * )
 */
class Tag1MediaImage extends File {

  /**
   * {@inheritdoc}
   */
  public function query() {
    $query = parent::query();
    $query->condition('f.filemime', 'image/%', 'LIKE');
    $query->condition('f.status', 1);
    return $query;
  }

}


Note: Source plugins that only alter the parent's query might no longer be necessary once this core issue lands.

A detailed review of the PHP code for the custom source plugin will be left as an exercise to the curious reader. You do not need to do much to create a custom process plugin that extends an existing one. In this case, we are only overwriting the query method to add two conditions:

Please note that we are adding conditions to the parent query provided by d7_file. So, you need to review that source plugin's query implementation to see how the initial query is built. At the risk of stating the obvious, by extending the Drupal\file\Plugin\migrate\source\d7\File class, we also inherit all the other methods defined in that class and up in its parent class hierarchy. The importance of this will be evident when writing the migration for creating image media entities.

Before switching focus to writing a media migration, consider how a custom source plugin that extends d7_file would look file if the Drupal 7 site used the File Entity module:

<?php

namespace Drupal\tag1_migration\Plugin\migrate\source;

use Drupal\file\Plugin\migrate\source\d7\File;

/**
 * Retrieve permanent files optionally filtered by file type bundles.
 *
 * Available configuration keys:
 * - type: Only retrieve files matching the specified file type bundles. Can be
 *   set to a string or an array. If not declared then files of all file types
 *   bundles will be retrieved.
 *
 * @see \Drupal\file\Plugin\migrate\source\d7\File
 * @see file_entity_file_default_types() in Drupal 7's file_entity.module file.
 *
 * @MigrateSource(
 *   id = "tag1_d7_file",
 *   source_module = "file_entity"
 * )
 */
class Tag1File extends File {

  /**
   * {@inheritdoc}
   */
  public function query() {
    $query = parent::query();

    // Only migrate permanent files.
    $query->condition('f.status', 1);

    // Filter by file type bundle, if configured.
    if (isset($this->configuration['type'])) {
      $query->condition('f.type', (array) $this->configuration['type'], 'IN');
    }

    return $query;
  }

}


The tag1_d7_file above can accept an optional type configuration to only retrieve files matching the specified file types. Then, you can use the plugin as follow:

source:
 key: migrate
 plugin: tag1_d7_file
 type: image

Note to my future self: Thanks for providing the code for a source plugin I can copy/paste in the next migration project!

From Drupal 7 image fields to Drupal 10 image media entities

Don’t give up just yet. We have one more exercise to go. Take your dumbbells for three rounds of overhead presses. Afterward we’ll write another migration by hand.

Time to import Drupal 7 images as media entities in Drupal 10. Before doing so, you need to understand what entities will be created, their base field definitions, and the fields attached to each bundle. We covered this in great detail in article 26. Additionally, this article has a reference of base field definitions for Drupal 10 media entities. In short, we will create entities of type media of the image bundle and populate its field_media_image image field with a reference from previously migrated files.

Create an upgrade_d7_media_image.yml file in the web/modules/custom/tag1_migration/migrations folder of our Drupal 10 project. We will use that same file name, without the file extension, as the migration ID. You can come up with any name as long as it is unique. Below is the content of the file:

id: upgrade_d7_media_image
class: Drupal\migrate\Plugin\Migration
migration_tags:
 - 'Drupal 7'
 - Content
 - media
 - tag1_content
label: 'Media (Image)'
source:
 key: migrate
 plugin: tag1_media_image
 constants:
   source_base_path: NULL
 high_water_property:
   name: fid
   alias: f
process:
 name: filename
 status: status
 created: timestamp
 changed: timestamp
 field_media_image/alt: filename
 field_media_image/target_id:
   - plugin: migration_lookup
     source: fid
     migration: upgrade_d7_file
     no_stub: true
   - plugin: skip_on_empty
     method: row
     message: 'The file was not found.'
 uid:
   - plugin: migration_lookup
     source: uid
     migration: upgrade_d7_user
     no_stub: true
   - plugin: default_value
     default_value: 1
 langcode:
   plugin: default_value
   default_value: en
destination:
 plugin: 'entity:media'
 default_bundle: image
migration_dependencies:
 required:
   - upgrade_d7_file
   - upgrade_d7_user
 optional: {  }

Now, rebuild caches for our new migration to be detected and trigger an import operation.

ddev drush cache:rebuild
ddev drush migrate:status upgrade_d7_media_image
ddev drush migrate:import upgrade_d7_media_image

If things are properly configured, you should not get any errors. Go to https://migration-drupal10.ddev.site/admin/content/media?type=image and you will be presented with beautiful images created by Drupal 7's Devel Generate module.

Note that in our upgrade_d7_media_image migration, we are calling the tag1_media_image. This should match the id of the MigrateSource PHP annotation used in our custom source plugin. We recycled the configuration for high_water_property from the upgrade_d7_file.

What is that strange source_base_path constant with a value of NULL? If you do not include it, you will get the following warning for each record retrieved by our custom source plugin: Undefined array key "constants" File.php:105. Remember what we said earlier? By extending a source plugin, you inherit all its methods. Well, that warning comes from implementation of the prepareRow method in the d7_file source plugin. That plugin assumes a source constant named source_base_path exists for setting the filepath of files to import. Our media migration makes no use of such source property, but the expectation that the constant exists is very much present. So, to suppress the warning, we can provide the source constant and set it to a NULL value.

We use the entity:media destination plugin to create media entities of type image. In the migration_dependencies section, we add upgrade_d7_file and upgrade_d7_user as required dependencies, because we will perform migration lookups against them. The process section contains mapping for multiple base field definitions, which we hope will be straightforward to understand now. The key part of this migration is populating the field_media_image field.

In Drupal 10, image fields are entity reference fields. As noted in this article, this field type has five sub-fields:

  1. target_id: An integer representing the ID of the target entity.
  2. alt: Alternative image text, for the image's alt attribute.
  3. title: Image title text, for the image's title attribute.
  4. width: The width of the image in pixels.
  5. height: The height of the image in pixels.

Our source data does not contain information for width and height. When not specified, the ImageItem class responsible for providing image fields loads the file to determine its dimensions. If available, setting the value for these sub-fields will yield a performance boost, because there will be no need to calculate the image's dimensions.

While not ideal, we are using the filename as the alt attribute. For title, we are not even going to pretend to have a sensible value to use. Maybe a good compromise would be generating image descriptions and alt-text with AI like Drupal's project founder Dries Buytaert considered doing for his website.

Finally, the most important sub-field to set for any entity reference field: target_id. Let's revisit the process pipeline to set it:

process:
 field_media_image/target_id:
   - plugin: migration_lookup
     source: fid
     migration: upgrade_d7_file
     no_stub: true
   - plugin: skip_on_empty
     method: row
     message: 'The file was not found.'

For image fields, target_id should be set to the referenced file ID. Our source plugin provides the fid source property that can be used for this purpose. It might be tempting to assign a copy fid directly into field_media_image/target_id. But that neglects the fact that it is possible for records to exist in Drupal 7's file_managed table, but the corresponding files are not present on disk. We will not speculate about what would cause such a situation to happen, but I have seen this many times in real life projects. So, to safeguard us from creating invalid media entities that point to non-existent files, we perform a migration_lookup against the upgrade_d7_file migration. If that migration was not able to retrieve the file from Drupal 7 for any reason, we leverage the skip_on_empty process plugin to bail out from creating the media entity.

If the above sounds familiar, it is because a similar technique is used to connect paragraphs to their host entities as explained in the previous article. The difference is that paragraph fields are of type entity reference revisions and require setting the target_revision_id sub-field in addition to target_id. Fields of type entity reference — like files, images, taxonomy term references, user references, etc. — only require setting the target_id sub-field.

Time to cool down after an intense upper body exercise routine. Those migration muscles might be aching, but hey... no pain no gain!

Now you know how to create custom source plugins, extend existing ones, and use them to create media entities. In the next article, we will work our lower body... I mean, we will work on migrating yet another entity: nodes. We might or might not create one more source plugin. Only time will tell.


Image by PayPal.me/FelixMittermeier from Pixabay