Series Overview & ToC | Previous Article | Next Article - coming soon!
In the previous article, we learned to create custom source plugins and leveraged them to migrate media entities. Today, we’ll migrate nodes and create another custom source plugin to prevent unpublished nodes from being imported. After applying the content model changes stipulated in our migration plan, our example Drupal 10 project ended up having five content types. In this article, we’ll cover two: basic pages and articles. We’ll also revisit a key topic for Drupal migration projects: relationships among entities.
Entity ID and high water mark considerations
Nodes are revisionable entities. When the migration project involves content model changes, you need to consider IDs from all entities in the source site that will be used for creating nodes in the destination site. Our example project includes conversions from nodes to other entities like users and taxonomy terms. But we’re not creating Drupal 10 nodes from multiple Drupal 7 entity types. Therefore, reviewing the Drupal 7 node tables is enough to determine a suitable AUTO_INCREMENT value for Drupal 10's node entity. From the drupal7
folder, execute ddev mysql
to open an interactive SQL shell. Then execute the following queries:
-- Get the highest node ID value.
SELECT nid FROM node ORDER BY nid DESC LIMIT 1;
-- Get the highest node revision ID value.
SELECT vid FROM node_revision ORDER BY vid DESC LIMIT 1;
For brevity, we’re showing you how to configure the auto_increment_alter_content_entities
setting of the AUTO_INCREMENT Alter module to apply the new AUTO_INCREMENT value. Refer to article 23 for more information of how this module works or how to perform the operation in custom code.
$settings['auto_increment_alter_content_entities'] = [
'node' => [450, 1000], // Alter the tables for the node content entity.
];
Now execute the command provided by the AUTO_INCREMENT Alter module to trigger the alter operation in the Drupal 10 project.
ddev drush auto-increment-alter:content-entities
As for the high water property, we need to identify one field returned by the source plugin whose value always increases. Drupal includes three source plugins, explained later, to fetch Drupal 7 node data. All of them retrieve the node revision ID (vid
) column from Drupal 7's node_revision
table. Today, we’ll create a custom tag1_d7_node
source plugin that extends one of the core ones. Unless we manually remove elements from the query, anything from the parent class will be available in our custom source plugin.
Therefore, we can configure the high_water_property
as follows:
source:
plugin: tag1_d7_node
high_water_property:
name: vid
alias: nr
Filtering out records from being imported
Our migration plan says there is no need to migrate unpublished nodes — no matter their content type. In article 25, we discussed how to filter out records using process plugins and we explained how to do it using source plugins in article 28. Now that we have seen two options, we would like to revisit the topic again to provide a bit more context.
From the perspective of the migrate API, there are four primary ways to filter out records to import.
-
The
query
method in the source plugin. This is the most performant option since the records are filtered out before they reach Drupal for processing. -
The
prepareRow
method in the source plugin. At this point, you have the record as retrieved by the source query. You can use data from retrieved fields to fetch more data from the source site to determine if a particular record should be migrated or not. For example, when using the organic groups module in Drupal 7, you can check which group a node belongs to determine if it should be migrated or not. ReturningFALSE
from this method tells the Migrate API that the records should not be migrated. -
As part of the process pipeline. At this point, you have the record as returned by the
prepareRow
method, which can add/change/remove data retrieved by the source query. We saw an example of this in the previous article when creating media entities from files. The media entity was created only if the corresponding file was imported in the file migration. Some process plugins that provide logic to skip records from being imported are skip_on_empty, skip_on_value, static_map, and gate. -
As part of the destination plugin. At this point, you have the transformed data as returned by the process pipeline. While it’s not common to filter out records this late in the import process, you can use it as a last line of defense to prevent migrating invalid entries. For example, the EntityContentBase destination plugin, base for multiple destination plugins that create content entities, exposes an
validate
configuration option. When set toTRUE
, the Migrate API will call the validate method of the entity API before importing a record. In practice, this means all validation constraints are enforced at the entity and field level.
Technical note: With the exception of the first option, the identifiers of the records that were filtered out are recorded in the migration map tables. It’s also possible to leave a message indicating why a record was skipped. Looking at the map and message tables for a migration can be useful to understand why some content was not migrated when the expectation was for it to be imported.
When deciding on which option to implement, remember to abide by ETL principles. Only use data pertinent to each phase when building the logic to filter out data. For example, a source plugin can inspect Drupal 7 data, but it should not check Drupal 10 data to determine if a record should be imported or not. This is similar to the advice of not creating entities on the fly as part of the process pipeline.
Extending the node source plugin to filter out unpublished nodes
Create a PHP file named Tag1Node.php
inside our tag1_migration
custom module's /src/Plugin/migrate/source
folder. The path relative to the Drupal 10's project docroot is web/modules/custom/tag1_migration/src/Plugin/migrate/source/Tag1Node.php
. The content of the file should be:
<?php
namespace Drupal\tag1_migration\Plugin\migrate\source;
use Drupal\node\Plugin\migrate\source\d7\Node;
/**
* Customizations for node migrations.
*
* @MigrateSource(
* id = "tag1_d7_node",
* source_module = "node",
* )
*/
class Tag1Node extends Node {
/**
* {@inheritdoc}
*/
public function query() {
$query = parent::query();
// Exclude unpublished content.
$query->condition('n.status', 1);
return $query;
}
}
As a reminder, the query method allows you to alter the SQL query used to retrieve data from the source database. You can add conditions, join tables, expand the list of fields to fetch, and much more. Note that custom source plugins that only alter the parent's query might no longer be necessary once this core issue is fixed.
Technical note: The drush migrate:status
command shows a report for each migration indicating the total number of records, how many have been imported, and many are still unprocessed. These numbers are accurate when the migrations have not been executed and you import all records at once. When running incremental migrations, it’s possible for things to change in the source site in between runs, like nodes being deleted or added. That can lead to some strange calculations — like a negative number for unprocessed records. In the report:
- Total is the count of records as returned by the source query at that point in time.
- Imported is the number of records already processed as recorded in the migration map tables.
- Unprocessed is the difference between total and unprocessed at that point in time.
Before writing a migration that makes use of our custom source plugin, I would like to point out that Drupal core offers three source plugins to retrieve node data:
-
d7_node
: Retrieves only the current revision. -
d7_node_revision
: Retrieves only past revisions. -
d7_node_complete
: Retrieves all current and past revisions.
For more information on using these plugins, read the change record on classic vs complete node migrations. In all cases, it’s possible to fetch node translations, if available. The first two options require setting the translations
configuration option to TRUE
. The last option fetches translations without having to specify extra configuration options. To account for translations when creating Drupal 10 content entities, set the translations
configuration option in the destination plugin, assuming it extends EntityContentBase. Below is how a node migration would be configured to import translations:
source:
key: migrate
plugin: d7_node
node_type: page
translations: TRUE
process: ...
destination:
plugin: 'entity:node'
default_bundle: page
translations: TRUE
Migrating basic pages
We use upgrade_d7_node_page
to migrate basic pages. Copy it from the reference folder into our tag1_migration
custom module and rebuild caches for the migration to be detected.
cd drupal10
cp ref_migrations/migrate_plus.migration.upgrade_d7_node_page.yml web/modules/custom/tag1_migration/migrations/upgrade_d7_node_page.yml
ddev drush cache:rebuild
If you do not have a migrate_plus.migration.upgrade_d7_node_page.yml
file in the ref_migrations
migrations folder, it’s likely that you have a file named migrate_plus.migration.upgrade_d7_node_complete_page.yml
instead. That would mean that you used the node complete approach, instead of the classic one, when performing the automated migration. Feel free to proceed with our instructions using that file. Ultimately, we are going to update the source plugin to use our custom one. The file name or plugin ID has no effect on what data is retrieved. That will depend on which source plugin is used and how it is configured.
Note that while copying the file, we also changed its name and placed it in a migrations
folder inside our tag1_migration
custom module. After copying the file, make the following changes:
- Remove the following keys:
uuid
,langcode
,status
,dependencies
,field_plugin_method
,cck_plugin_method
, andmigration_group
. - Add two migration tags:
node
andtag1_content
. - Add
key: migrate
under the source section. - Change the source plugin to configuration to use our custom
tag1_d7_node
plugin. - Add the
high_water_property
property as demonstrated above. - Update the migration dependencies so that only a required dependency on
upgrade_d7_user
remains.
We also need to account for changes in text formats. In article 22 we decided not to migrated Drupal 7 text formats, but leverage those that Drupal 10 provides out of the box. In practice this means that the filtered_html
format used in Drupal 7 no longer exists. In Drupal 10, the migration of any rich text field will have to be updated to map filtered_html
to a valid text format in the new site. Using the snippet below we replace it with the restricted_html
text format in the body
field:
process:
body:
-
plugin: sub_process
source: body
process:
value: value
summary: summary
format:
-
plugin: static_map
source: format
map:
filtered_html: restricted_html
bypass: TRUE
Note: You can use the snippet process proposed for the migrate plus module to create reusable process pipelines for use in your migrations.
After the modifications, the upgrade_d7_node_page.yml
file should look like this:
id: upgrade_d7_node_page
class: Drupal\migrate\Plugin\Migration
migration_tags:
- 'Drupal 7'
- Content
- node
- tag1_content
label: 'Nodes (Basic page)'
source:
key: migrate
plugin: tag1_d7_node
node_type: page
high_water_property:
name: vid
alias: nr
process:
nid:
-
plugin: get
source: tnid
vid:
-
plugin: get
source: vid
langcode:
-
plugin: default_value
source: language
default_value: und
title:
-
plugin: get
source: title
uid:
-
plugin: get
source: node_uid
status:
-
plugin: get
source: status
created:
-
plugin: get
source: created
changed:
-
plugin: get
source: changed
promote:
-
plugin: get
source: promote
sticky:
-
plugin: get
source: sticky
revision_uid:
-
plugin: get
source: revision_uid
revision_log:
-
plugin: get
source: log
revision_timestamp:
-
plugin: get
source: timestamp
comment_node_page/0/status:
-
plugin: get
source: comment
body:
-
plugin: sub_process
source: body
process:
value: value
summary: summary
format:
-
plugin: static_map
source: format
map:
filtered_html: restricted_html
bypass: TRUE
destination:
plugin: 'entity:node'
default_bundle: page
migration_dependencies:
required:
- upgrade_d7_user
optional: { }
Now, rebuild caches for our changes to be detected and execute the migration. Run migrate:status
to make sure we can connect to Drupal 7. Then, run migrate:import
to perform the import operations.
ddev drush cache:rebuild
ddev drush migrate:status upgrade_d7_node_page
ddev drush migrate:import upgrade_d7_node_page
If things are properly configured, you should not get any errors. Go to https://migration-drupal10.ddev.site/admin/content?type=page
and look at the list of migrated basic page nodes. More important though is what you do not see. Drupal 7's "Test page" node with nid
16 should not exist in the migrated site. If it appears, make sure you update the source plugin to use tag1_d7_node
, rebuild caches, rollback, and import again.
After running a migration, I would normally explain interesting or complex elements. There is not much to highlight for this one other than it’s the 21st migration we write so far in the series. Can you believe that? Congratulations! You now have a lot more experience than me when I worked on my first migration project.
Migrating articles
We use upgrade_d7_node_article
to migrate articles. Copy it from the reference folder into our tag1_migration
custom module and rebuild caches for the migration to be detected.
cd drupal10
cp ref_migrations/migrate_plus.migration.upgrade_d7_node_article.yml web/modules/custom/tag1_migration/migrations/upgrade_d7_node_article.yml
ddev drush cache:rebuild
If you do not have a migrate_plus.migration.upgrade_d7_node_article.yml
file in the ref_migrations
migrations folder, it’s likely that you have a file named migrate_plus.migration.upgrade_d7_node_complete_page.yml
instead. That would mean that you used the node complete approach, instead of the classic one, when performing the automated migration. Feel free to proceed with our instructions using that file. Ultimately, we are going to update the source plugin to use our custom one. The file name or plugin ID has no effect on what data is retrieved. That will depend on which source plugin is used and how it is configured.
Note that while copying the file, we also changed its name and placed it in a migrations
folder inside our tag1_migration
custom module. After copying the file, make the following changes:
- Remove the following keys:
uuid
,langcode
,status
,dependencies
,field_plugin_method
,cck_plugin_method
, andmigration_group
. - Add two migration tags:
node
andtag1_content
. - Add
key: migrate
under the source section. - Change the source plugin to configuration to use our custom
tag1_d7_node
plugin. - Add the
high_water_property
property as demonstrated above. - Update the migration dependencies so that
upgrade_d7_taxonomy_term
andupgrade_d7_user
are listed as required dependencies. - Apply the same treatment to account for text format changes in the
body
as we did in the page migration.
After the modifications, the upgrade_d7_node_article.yml
file should look like this:
id: upgrade_d7_node_article
class: Drupal\migrate\Plugin\Migration
migration_tags:
- 'Drupal 7'
- Content
- node
- tag1_content
label: 'Nodes (Article)'
source:
key: migrate
plugin: tag1_d7_node
node_type: article
high_water_property:
name: vid
alias: nr
process:
nid:
-
plugin: get
source: tnid
vid:
-
plugin: get
source: vid
langcode:
-
plugin: default_value
source: language
default_value: und
title:
-
plugin: get
source: title
uid:
-
plugin: get
source: node_uid
status:
-
plugin: get
source: status
created:
-
plugin: get
source: created
changed:
-
plugin: get
source: changed
promote:
-
plugin: get
source: promote
sticky:
-
plugin: get
source: sticky
revision_uid:
-
plugin: get
source: revision_uid
revision_log:
-
plugin: get
source: log
revision_timestamp:
-
plugin: get
source: timestamp
comment_node_article/0/status:
-
plugin: get
source: comment
body:
-
plugin: sub_process
source: body
process:
value: value
summary: summary
format:
-
plugin: static_map
source: format
map:
filtered_html: restricted_html
bypass: TRUE
field_tags:
-
plugin: sub_process
source: field_tags
process:
target_id: tid
field_image:
-
plugin: sub_process
source: field_image
process:
target_id: fid
alt: alt
title: title
width: width
height: height
destination:
plugin: 'entity:node'
default_bundle: article
migration_dependencies:
required:
- upgrade_d7_taxonomy_term
- upgrade_d7_user
optional: { }
Now, rebuild caches for our changes to be detected and execute the migration. We run migrate:status
to make sure we can connect to Drupal 7. Then, we run migrate:import
to perform the import operations.
ddev drush cache:rebuild
ddev drush migrate:status upgrade_d7_node_article
ddev drush migrate:import upgrade_d7_node_article
If things are properly configured, you should not get any errors. Go to https://migration-drupal10.ddev.site/admin/content?type=article
and look at the list of migrated article nodes. More important, though, is what you do not see. Drupal 7's "Test article" node with nid
71 should not exist in the migrated site. If it appears, make sure you update the source plugin to use tag1_d7_node
, rebuild caches, rollback, and import again.
Understanding migration lookup operations
Review the migrated articles. Do you notice something missing? Correct! None of the nodes have images. The generated migration does not account for content model changes. In our Drupal 10 site, we decided to use media reference fields instead of regular image fields to store the articles' images. Those paying close attention will also notice that the machine name of the field attached to the article content type is different. In Drupal 7, it’s named field_image
while in Drupal 10 it’s field_media_image
. To account for the change in field name and type, replace the field_image
assignment in the upgrade_d7_node_article.yml
file with the snippet below and add the upgrade_d7_media_image
to the list of required migration dependencies.
process:
field_media_image:
-
plugin: sub_process
source: field_image
process:
target_id:
-
plugin: migration_lookup
source: fid
migration: upgrade_d7_media_image
no_stub: true
Let's explain how we came up with this process pipeline. The Drupal 7 source plugin will return a field named field_image
with a structure similar to the following:
[
0 => [
'fid' => 1885,
'alt' => 'The quick brown fox jumps over the lazy dog',
'title' => NULL,
'width' => 1280,
'width' => 720,
],
1 => [
'fid' => 344,
'alt' => 'Vegard and Bård in fox costumes',
'title' => 'What does the fox say?',
'width' => 2013,
'width' => 903,
],
]
In Drupal 10, the field_media_image
is of type media reference which expects a single sub-field to be set: target_id
. In the previous article, we created Drupal 10 media entities from Drupal 7 image files. We can perform a lookup operation against the upgrade_d7_media_image
migration to obtain the media entity ID that corresponds to Drupal 7's file ID.
But wait? Is there anything special about fid
that the migration_lookup
plugin knows how to convert that to a media entity ID? What if sent a different value like the alt
attribute? Would that work? No, there’s nothing special about fid.
Using alt
or any other of the sub-fields of field_image
would work.
The way the migration_lookup plugin works is by mapping the passed source value(s) to the identifier(s) of the migration(s) over which the lookup operation will be performed. In this case, we are checking against the upgrade_d7_media_image
migration. Ok, but where do we specify that fid
is its identifier? Migrations are like onions. They have layers. Follow along:
- The
upgrade_d7_media_image
migration uses thetag1_media_image
source plugin. - The
tag1_media_image
source plugin extends Drupal core'sd7_file
source plugin. - The
d7_file
source plugin implements the getIds method of the MigrateSourceInterface to define the list of source fields that uniquely identify a source row. - In said
getIds
method, the filefid
is designated as the only source field.
When configuring the lookup operation against upgrade_d7_media_image
, the name fid
is not special in itself. What is important is that the value that we are using for the source configuration of the lookup operation represents a file ID.
Also note that we are looking against upgrade_d7_media_image
, not upgrade_d7_file
. The former creates media entities. The latter creates file entities. In upgrade_d7_node_article
we are populating the field_media_image
field, which is a media reference field. So, we need the media entity IDs, not file entity IDs, to properly assign the relationship to the media entity. If this sounds confusing, welcome to the club. Take time to understand it and make a mental model of the relationship among entities.
For reference, below is part of the code of the class that defines the d7_file
source plugin:
/**
* Drupal 7 file source from database.
*
* @see \Drupal\file\Plugin\migrate\source\d7\File
*
* @MigrateSource(
* id = "d7_file",
* source_module = "file"
* )
*/
class File extends DrupalSqlBase {
/**
* {@inheritdoc}
*/
public function getIds() {
$ids['fid']['type'] = 'integer';
$ids['fid']['alias'] = 'f';
return $ids;
}
}
Technical note: It’s possible that multiple fields might be necessary to uniquely identify a source record. For example, the d7_node_complete source plugin uses the combination of three fields as a unique identifier: nid
, vid
, language
. In such cases, you can pass an array of values to the source
configuration of the migration_lookup
process plugin.
After modifying the upgrade_d7_node_article.yml
file, rollback the migration, rebuild caches, and execute the import operation again. Then, visit any of the articles to view a beautiful image created by Drupal 7's Devel Generate module.
ddev drush migrate:rollback upgrade_d7_node_article
ddev drush cache:rebuild
ddev drush migrate:status upgrade_d7_node_article
ddev drush migrate:import upgrade_d7_node_article
Understanding the relationships among Drupal entities
Being able to establish relationships among entities is the foundation of the flexible and powerful content modeling capabilities of Drupal. As noted in article 5, there are two types of relationships among entities: explicit and implicit.
Explicit relationships are entity reference fields to nodes, users, taxonomy terms, files, media entities, paragraphs, groups, commerce products, etc. Implicit relationships are those established by base field definitions, including the content type for a node and the user who created it.
In article 7, we talked about two types of Drupal entities: content and configuration. Both explicit and implicit relationships can point to either content or configuration entities. As an example, consider one of the migrated articles nodes:
- The
uid
base field definition is an implicit relationship to theuser
content entity. - The
type
base field definition is an implicit relationship to thenode_type
configuration entity. - The
field_tags
field is an explicit relationship to thetags
bundle of thetaxonomy_term
content entity - The
field_media_image
field is an explicit relationship to theimage
bundle of themedia
content entity.
Having multiple layers of relationships among entities is common. Expanding the example above, the image
bundle of the media
content entity has an entity reference field named field_media_image
. It establishes an explicit relationship to the file
content entity.
I did not mean to write a tongue-twister. Calling things by their names is helpful in keeping track of the multiple levels of relationships among Drupal entities. Also helpful are visual queues. The Entity Relationship Diagrams module lets you visualize the entity structure of your Drupal site. Database tools like MySQL Workbench can reverse engineer your database to generate entity-relationship diagrams. If nothing else, use a pen and paper to sketch the relationships between entities. If your drawing skills are like mine, a digital sketchbook might be a better option.
Consider the diagram below from my Upgrading to Drupal 10 using the Migrate API presentation:
Among other relationships, it shows a direct connection between the node and file entities via an image field. If we wanted to change the content model to use media entities, there would be a connection from node to media entities via a media reference field and then a connection from media to file entities via an image field. The diagram below demonstrates the new connections among node, media, and file entities.
Before wrapping up this section, I would like to highlight something that we have been doing all along throughout the series. Each of the 20+ migrations we created is responsible for importing only one type of entity each. Focusing on the ones responsible for creating content, we started with files because it has no dependencies on other migrations. We later continued with users, taxonomy terms, paragraphs, media, and finally nodes. This order is very much intentional.
Migrations executed later perform lookup operations against migrations that came before to establish relationships among entities.
Technical note: The file entity has a uid
base field definition that establishes a relationship to the user content entity. It’s used to keep track of who uploaded the file. The upgrade_d7_file
migration is the first one we executed. When the file entities are created, the value for uid
is written to the database even though the referenced users might not exist in the system yet. Later on when we execute the upgrade_d7_user
migration, those previously established relationships are still there and now point to valid user entities.
Today's custom source plugin and migrations were less complex than those covered in previous articles. That gave us a chance to elaborate on two very important topics: how to filter out records from being imported and to better understand the relationships among Drupal entities. In the next article, we’ll write the migrations for the three content types that are pending to import.