Migrating Your Data from D7 to D10: Migrating nodes - Part 1

Series Overview & ToC | Previous Article | Next Article

In the previous article, we learned to create custom source plugins and leveraged them to migrate media entities. Today, we’ll migrate nodes and create another custom source plugin to prevent unpublished nodes from being imported. After applying the content model changes stipulated in our migration plan, our example Drupal 10 project ended up having five content types. In this article, we’ll cover two: basic pages and articles. We’ll also revisit a key topic for Drupal migration projects: relationships among entities.

Entity ID and high water mark considerations

Nodes are revisionable entities. When the migration project involves content model changes, you need to consider IDs from all entities in the source site that will be used for creating nodes in the destination site. Our example project includes conversions from nodes to other entities like users and taxonomy terms. But we’re not creating Drupal 10 nodes from multiple Drupal 7 entity types. Therefore, reviewing the Drupal 7 node tables is enough to determine a suitable AUTO_INCREMENT value for Drupal 10's node entity. From the drupal7 folder, execute ddev mysql to open an interactive SQL shell. Then execute the following queries:

-- Get the highest node ID value.
SELECT nid FROM node ORDER BY nid DESC LIMIT 1;

-- Get the highest node revision ID value.
SELECT vid FROM node_revision ORDER BY vid DESC LIMIT 1;

For brevity, we’re showing you how to configure the auto_increment_alter_content_entities setting of the AUTO_INCREMENT Alter module to apply the new AUTO_INCREMENT value. Refer to article 23 for more information of how this module works or how to perform the operation in custom code.

$settings['auto_increment_alter_content_entities'] = [
 'node' => [450, 1000], // Alter the tables for the node content entity.
];

Now execute the command provided by the AUTO_INCREMENT Alter module to trigger the alter operation in the Drupal 10 project.

ddev drush auto-increment-alter:content-entities

As for the high water property, we need to identify one field returned by the source plugin whose value always increases. Drupal includes three source plugins, explained later, to fetch Drupal 7 node data. All of them retrieve the node revision ID (vid) column from Drupal 7's node_revision table. Today, we’ll create a custom tag1_d7_node source plugin that extends one of the core ones. Unless we manually remove elements from the query, anything from the parent class will be available in our custom source plugin.

Therefore, we can configure the high_water_property as follows:

source:
 plugin: tag1_d7_node
 high_water_property:
   name: vid
   alias: nr

Filtering out records from being imported

Our migration plan says there is no need to migrate unpublished nodes — no matter their content type. In article 25, we discussed how to filter out records using process plugins and we explained how to do it using source plugins in article 28. Now that we have seen two options, we would like to revisit the topic again to provide a bit more context.

From the perspective of the migrate API, there are four primary ways to filter out records to import.

The query method in the source plugin. This is the most performant option since the records are filtered out before they reach Drupal for processing.
The prepareRow method in the source plugin. At this point, you have the record as retrieved by the source query. You can use data from retrieved fields to fetch more data from the source site to determine if a particular record should be migrated or not. For example, when using the organic groups module in Drupal 7, you can check which group a node belongs to determine if it should be migrated or not. Returning FALSE from this method tells the Migrate API that the records should not be migrated.
As part of the process pipeline. At this point, you have the record as returned by the prepareRow method, which can add/change/remove data retrieved by the source query. We saw an example of this in the previous article when creating media entities from files. The media entity was created only if the corresponding file was imported in the file migration. Some process plugins that provide logic to skip records from being imported are skip_on_empty, skip_on_value, static_map, and gate.
As part of the destination plugin. At this point, you have the transformed data as returned by the process pipeline. While it’s not common to filter out records this late in the import process, you can use it as a last line of defense to prevent migrating invalid entries. For example, the EntityContentBase destination plugin, base for multiple destination plugins that create content entities, exposes an validate configuration option. When set to TRUE, the Migrate API will call the validate method of the entity API before importing a record. In practice, this means all validation constraints are enforced at the entity and field level.

Technical note: With the exception of the first option, the identifiers of the records that were filtered out are recorded in the migration map tables. It’s also possible to leave a message indicating why a record was skipped. Looking at the map and message tables for a migration can be useful to understand why some content was not migrated when the expectation was for it to be imported.

When deciding on which option to implement, remember to abide by ETL principles. Only use data pertinent to each phase when building the logic to filter out data. For example, a source plugin can inspect Drupal 7 data, but it should not check Drupal 10 data to determine if a record should be imported or not. This is similar to the advice of not creating entities on the fly as part of the process pipeline.

Extending the node source plugin to filter out unpublished nodes

Create a PHP file named Tag1Node.php inside our tag1_migration custom module's /src/Plugin/migrate/source folder. The path relative to the Drupal 10's project docroot is web/modules/custom/tag1_migration/src/Plugin/migrate/source/Tag1Node.php. The content of the file should be:

<?php

namespace Drupal\tag1_migration\Plugin\migrate\source;

use Drupal\node\Plugin\migrate\source\d7\Node;

/**
 * Customizations for node migrations.
 *
 * @MigrateSource(
 *   id = "tag1_d7_node",
 *   source_module = "node",
 * )
 */
class Tag1Node extends Node {

  /**
   * {@inheritdoc}
   */
  public function query() {
    $query = parent::query();
    // Exclude unpublished content.
    $query->condition('n.status', 1);
    return $query;
  }

}

As a reminder, the query method allows you to alter the SQL query used to retrieve data from the source database. You can add conditions, join tables, expand the list of fields to fetch, and much more. Note that custom source plugins that only alter the parent's query might no longer be necessary once this core issue is fixed.

Technical note: The drush migrate:status command shows a report for each migration indicating the total number of records, how many have been imported, and many are still unprocessed. These numbers are accurate when the migrations have not been executed and you import all records at once. When running incremental migrations, it’s possible for things to change in the source site in between runs, like nodes being deleted or added. That can lead to some strange calculations — like a negative number for unprocessed records. In the report:

Total is the count of records as returned by the source query at that point in time.
Imported is the number of records already processed as recorded in the migration map tables.
Unprocessed is the difference between total and unprocessed at that point in time.

Before writing a migration that makes use of our custom source plugin, I would like to point out that Drupal core offers three source plugins to retrieve node data:

d7_node: Retrieves only the current revision.
d7_node_revision: Retrieves only past revisions.
d7_node_complete: Retrieves all current and past revisions.

For more information on using these plugins, read the change record on classic vs complete node migrations. In all cases, it’s possible to fetch node translations, if available. The first two options require setting the translations configuration option to TRUE. The last option fetches translations without having to specify extra configuration options. To account for translations when creating Drupal 10 content entities, set the translations configuration option in the destination plugin, assuming it extends EntityContentBase. Below is how a node migration would be configured to import translations:

source:
 key: migrate
 plugin: d7_node
 node_type: page
 translations: TRUE
process: ...
destination:
 plugin: 'entity:node'
 default_bundle: page
 translations: TRUE

Migrating basic pages

We use upgrade_d7_node_page to migrate basic pages. Copy it from the reference folder into our tag1_migration custom module and rebuild caches for the migration to be detected.

cd drupal10
cp ref_migrations/migrate_plus.migration.upgrade_d7_node_page.yml web/modules/custom/tag1_migration/migrations/upgrade_d7_node_page.yml
ddev drush cache:rebuild

If you do not have a migrate_plus.migration.upgrade_d7_node_page.yml file in the ref_migrations migrations folder, it’s likely that you have a file named migrate_plus.migration.upgrade_d7_node_complete_page.yml instead. That would mean that you used the node complete approach, instead of the classic one, when performing the automated migration. Feel free to proceed with our instructions using that file. Ultimately, we are going to update the source plugin to use our custom one. The file name or plugin ID has no effect on what data is retrieved. That will depend on which source plugin is used and how it is configured.

Note that while copying the file, we also changed its name and placed it in a migrations folder inside our tag1_migration custom module. After copying the file, make the following changes:

Remove the following keys: uuid, langcode, status, dependencies, field_plugin_method, cck_plugin_method, and migration_group.
Add two migration tags: node and tag1_content.
Add key: migrate under the source section.
Change the source plugin to configuration to use our custom tag1_d7_node plugin.
Add the high_water_property property as demonstrated above.
Update the migration dependencies so that only a required dependency on upgrade_d7_user remains.

We also need to account for changes in text formats. In article 22 we decided not to migrated Drupal 7 text formats, but leverage those that Drupal 10 provides out of the box. In practice this means that the filtered_html format used in Drupal 7 no longer exists. In Drupal 10, the migration of any rich text field will have to be updated to map filtered_html to a valid text format in the new site. Using the snippet below we replace it with the restricted_html text format in the body field:

process:
  body:
    -
      plugin: sub_process
      source: body
      process:
        value: value
        summary: summary
        format:
          -
            plugin: static_map
            source: format
            map:
              filtered_html: restricted_html
            bypass: TRUE

Note: You can use the snippet process proposed for the migrate plus module to create reusable process pipelines for use in your migrations.

After the modifications, the upgrade_d7_node_page.yml file should look like this:

id: upgrade_d7_node_page
class: Drupal\migrate\Plugin\Migration
migration_tags:
  - 'Drupal 7'
  - Content
  - node
  - tag1_content
label: 'Nodes (Basic page)'
source:
  key: migrate
  plugin: tag1_d7_node
  node_type: page
  high_water_property:
    name: vid
    alias: nr
process:
  nid:
    -
      plugin: get
      source: tnid
  vid:
    -
      plugin: get
      source: vid
  langcode:
    -
      plugin: default_value
      source: language
      default_value: und
  title:
    -
      plugin: get
      source: title
  uid:
    -
      plugin: get
      source: node_uid
  status:
    -
      plugin: get
      source: status
  created:
    -
      plugin: get
      source: created
  changed:
    -
      plugin: get
      source: changed
  promote:
    -
      plugin: get
      source: promote
  sticky:
    -
      plugin: get
      source: sticky
  revision_uid:
    -
      plugin: get
      source: revision_uid
  revision_log:
    -
      plugin: get
      source: log
  revision_timestamp:
    -
      plugin: get
      source: timestamp
  comment_node_page/0/status:
    -
      plugin: get
      source: comment
  body:
    -
      plugin: sub_process
      source: body
      process:
        value: value
        summary: summary
        format:
          -
            plugin: static_map
            source: format
            map:
              filtered_html: restricted_html
            bypass: TRUE
destination:
  plugin: 'entity:node'
  default_bundle: page
migration_dependencies:
  required:
    - upgrade_d7_user
  optional: {  }

Now, rebuild caches for our changes to be detected and execute the migration. Run migrate:status to make sure we can connect to Drupal 7. Then, run migrate:import to perform the import operations.

ddev drush cache:rebuild
ddev drush migrate:status upgrade_d7_node_page
ddev drush migrate:import upgrade_d7_node_page

If things are properly configured, you should not get any errors. Go to https://migration-drupal10.ddev.site/admin/content?type=page and look at the list of migrated basic page nodes. More important though is what you do not see. Drupal 7's "Test page" node with nid 16 should not exist in the migrated site. If it appears, make sure you update the source plugin to use tag1_d7_node, rebuild caches, rollback, and import again.

After running a migration, I would normally explain interesting or complex elements. There is not much to highlight for this one other than it’s the 21st migration we write so far in the series. Can you believe that? Congratulations! You now have a lot more experience than me when I worked on my first migration project.

Migrating articles

We use upgrade_d7_node_article to migrate articles. Copy it from the reference folder into our tag1_migration custom module and rebuild caches for the migration to be detected.

cd drupal10
cp ref_migrations/migrate_plus.migration.upgrade_d7_node_article.yml web/modules/custom/tag1_migration/migrations/upgrade_d7_node_article.yml
ddev drush cache:rebuild

If you do not have a migrate_plus.migration.upgrade_d7_node_article.yml file in the ref_migrations migrations folder, it’s likely that you have a file named migrate_plus.migration.upgrade_d7_node_complete_page.yml instead. That would mean that you used the node complete approach, instead of the classic one, when performing the automated migration. Feel free to proceed with our instructions using that file. Ultimately, we are going to update the source plugin to use our custom one. The file name or plugin ID has no effect on what data is retrieved. That will depend on which source plugin is used and how it is configured.

Note that while copying the file, we also changed its name and placed it in a migrations folder inside our tag1_migration custom module. After copying the file, make the following changes:

Remove the following keys: uuid, langcode, status, dependencies, field_plugin_method, cck_plugin_method, and migration_group.
Add two migration tags: node and tag1_content.
Add key: migrate under the source section.
Change the source plugin to configuration to use our custom tag1_d7_node plugin.
Add the high_water_property property as demonstrated above.
Update the migration dependencies so that upgrade_d7_taxonomy_term and upgrade_d7_user are listed as required dependencies.
Apply the same treatment to account for text format changes in the body as we did in the page migration.

After the modifications, the upgrade_d7_node_article.yml file should look like this:

id: upgrade_d7_node_article
class: Drupal\migrate\Plugin\Migration
migration_tags:
  - 'Drupal 7'
  - Content
  - node
  - tag1_content
label: 'Nodes (Article)'
source:
  key: migrate
  plugin: tag1_d7_node
  node_type: article
  high_water_property:
    name: vid
    alias: nr
process:
  nid:
    -
      plugin: get
      source: tnid
  vid:
    -
      plugin: get
      source: vid
  langcode:
    -
      plugin: default_value
      source: language
      default_value: und
  title:
    -
      plugin: get
      source: title
  uid:
    -
      plugin: get
      source: node_uid
  status:
    -
      plugin: get
      source: status
  created:
    -
      plugin: get
      source: created
  changed:
    -
      plugin: get
      source: changed
  promote:
    -
      plugin: get
      source: promote
  sticky:
    -
      plugin: get
      source: sticky
  revision_uid:
    -
      plugin: get
      source: revision_uid
  revision_log:
    -
      plugin: get
      source: log
  revision_timestamp:
    -
      plugin: get
      source: timestamp
  comment_node_article/0/status:
    -
      plugin: get
      source: comment
  body:
    -
      plugin: sub_process
      source: body
      process:
        value: value
        summary: summary
        format:
          -
            plugin: static_map
            source: format
            map:
              filtered_html: restricted_html
            bypass: TRUE
  field_tags:
    -
      plugin: sub_process
      source: field_tags
      process:
        target_id: tid
  field_image:
    -
      plugin: sub_process
      source: field_image
      process:
        target_id: fid
        alt: alt
        title: title
        width: width
        height: height
destination:
  plugin: 'entity:node'
  default_bundle: article
migration_dependencies:
  required:
    - upgrade_d7_taxonomy_term
    - upgrade_d7_user
  optional: {  }

Now, rebuild caches for our changes to be detected and execute the migration. We run migrate:status to make sure we can connect to Drupal 7. Then, we run migrate:import to perform the import operations.

ddev drush cache:rebuild
ddev drush migrate:status upgrade_d7_node_article
ddev drush migrate:import upgrade_d7_node_article

If things are properly configured, you should not get any errors. Go to https://migration-drupal10.ddev.site/admin/content?type=article and look at the list of migrated article nodes. More important, though, is what you do not see. Drupal 7's "Test article" node with nid 71 should not exist in the migrated site. If it appears, make sure you update the source plugin to use tag1_d7_node, rebuild caches, rollback, and import again.

Understanding migration lookup operations

Review the migrated articles. Do you notice something missing? Correct! None of the nodes have images. The generated migration does not account for content model changes. In our Drupal 10 site, we decided to use media reference fields instead of regular image fields to store the articles' images. Those paying close attention will also notice that the machine name of the field attached to the article content type is different. In Drupal 7, it’s named field_image while in Drupal 10 it’s field_media_image. To account for the change in field name and type, replace the field_image assignment in the upgrade_d7_node_article.yml file with the snippet below and add the upgrade_d7_media_image to the list of required migration dependencies.

process:
 field_media_image:
   -
     plugin: sub_process
     source: field_image
     process:
       target_id:
         -
           plugin: migration_lookup
           source: fid
           migration: upgrade_d7_media_image
           no_stub: true

Let's explain how we came up with this process pipeline. The Drupal 7 source plugin will return a field named field_image with a structure similar to the following:

[
 0 => [
   'fid' => 1885,
   'alt' => 'The quick brown fox jumps over the lazy dog',
   'title' => NULL,
   'width' => 1280,
   'width' => 720,
 ],
 1 => [
   'fid' => 344,
   'alt' => 'Vegard and Bård in fox costumes',
   'title' => 'What does the fox say?',
   'width' => 2013,
   'width' => 903,
 ],
]

In Drupal 10, the field_media_image is of type media reference which expects a single sub-field to be set: target_id. In the previous article, we created Drupal 10 media entities from Drupal 7 image files. We can perform a lookup operation against the upgrade_d7_media_image migration to obtain the media entity ID that corresponds to Drupal 7's file ID.

But wait? Is there anything special about fid that the migration_lookup plugin knows how to convert that to a media entity ID? What if sent a different value like the alt attribute? Would that work? No, there’s nothing special about fid. Using alt or any other of the sub-fields of field_image would work.

The way the migration_lookup plugin works is by mapping the passed source value(s) to the identifier(s) of the migration(s) over which the lookup operation will be performed. In this case, we are checking against the upgrade_d7_media_image migration. Ok, but where do we specify that fid is its identifier? Migrations are like onions. They have layers. Follow along:

The upgrade_d7_media_image migration uses the tag1_media_image source plugin.
The tag1_media_image source plugin extends Drupal core's d7_file source plugin.
The d7_file source plugin implements the getIds method of the MigrateSourceInterface to define the list of source fields that uniquely identify a source row.
In said getIds method, the file fid is designated as the only source field.

When configuring the lookup operation against upgrade_d7_media_image, the name fid is not special in itself. What is important is that the value that we are using for the source configuration of the lookup operation represents a file ID.

Also note that we are looking against upgrade_d7_media_image, not upgrade_d7_file. The former creates media entities. The latter creates file entities. In upgrade_d7_node_article we are populating the field_media_image field, which is a media reference field. So, we need the media entity IDs, not file entity IDs, to properly assign the relationship to the media entity. If this sounds confusing, welcome to the club. Take time to understand it and make a mental model of the relationship among entities.

For reference, below is part of the code of the class that defines the d7_file source plugin:

/**
 * Drupal 7 file source from database.
 * 
 * @see \Drupal\file\Plugin\migrate\source\d7\File
 *
 * @MigrateSource(
 *   id = "d7_file",
 *   source_module = "file"
 * )
 */
class File extends DrupalSqlBase {

  /**
   * {@inheritdoc}
   */
  public function getIds() {
    $ids['fid']['type'] = 'integer';
    $ids['fid']['alias'] = 'f';
    return $ids;
  }

}

Technical note: It’s possible that multiple fields might be necessary to uniquely identify a source record. For example, the d7_node_complete source plugin uses the combination of three fields as a unique identifier: nid, vid, language. In such cases, you can pass an array of values to the source configuration of the migration_lookup process plugin.

After modifying the upgrade_d7_node_article.yml file, rollback the migration, rebuild caches, and execute the import operation again. Then, visit any of the articles to view a beautiful image created by Drupal 7's Devel Generate module.

ddev drush migrate:rollback upgrade_d7_node_article
ddev drush cache:rebuild
ddev drush migrate:status upgrade_d7_node_article
ddev drush migrate:import upgrade_d7_node_article

Understanding the relationships among Drupal entities

Being able to establish relationships among entities is the foundation of the flexible and powerful content modeling capabilities of Drupal. As noted in article 5, there are two types of relationships among entities: explicit and implicit.

Explicit relationships are entity reference fields to nodes, users, taxonomy terms, files, media entities, paragraphs, groups, commerce products, etc. Implicit relationships are those established by base field definitions, including the content type for a node and the user who created it.

In article 7, we talked about two types of Drupal entities: content and configuration. Both explicit and implicit relationships can point to either content or configuration entities. As an example, consider one of the migrated articles nodes:

The uid base field definition is an implicit relationship to the user content entity.
The type base field definition is an implicit relationship to the node_type configuration entity.
The field_tags field is an explicit relationship to the tags bundle of the taxonomy_term content entity
The field_media_image field is an explicit relationship to the image bundle of the media content entity.

Having multiple layers of relationships among entities is common. Expanding the example above, the image bundle of the media content entity has an entity reference field named field_media_image. It establishes an explicit relationship to the file content entity.

I did not mean to write a tongue-twister. Calling things by their names is helpful in keeping track of the multiple levels of relationships among Drupal entities. Also helpful are visual queues. The Entity Relationship Diagrams module lets you visualize the entity structure of your Drupal site. Database tools like MySQL Workbench can reverse engineer your database to generate entity-relationship diagrams. If nothing else, use a pen and paper to sketch the relationships between entities. If your drawing skills are like mine, a digital sketchbook might be a better option.

Consider the diagram below from my Upgrading to Drupal 10 using the Migrate API presentation:

alt_text

Among other relationships, it shows a direct connection between the node and file entities via an image field. If we wanted to change the content model to use media entities, there would be a connection from node to media entities via a media reference field and then a connection from media to file entities via an image field. The diagram below demonstrates the new connections among node, media, and file entities.

alt_text

Before wrapping up this section, I would like to highlight something that we have been doing all along throughout the series. Each of the 20+ migrations we created is responsible for importing only one type of entity each. Focusing on the ones responsible for creating content, we started with files because it has no dependencies on other migrations. We later continued with users, taxonomy terms, paragraphs, media, and finally nodes. This order is very much intentional.

Migrations executed later perform lookup operations against migrations that came before to establish relationships among entities.

Technical note: The file entity has a uid base field definition that establishes a relationship to the user content entity. It’s used to keep track of who uploaded the file. The upgrade_d7_file migration is the first one we executed. When the file entities are created, the value for uid is written to the database even though the referenced users might not exist in the system yet. Later on when we execute the upgrade_d7_user migration, those previously established relationships are still there and now point to valid user entities.

Today's custom source plugin and migrations were less complex than those covered in previous articles. That gave us a chance to elaborate on two very important topics: how to filter out records from being imported and to better understand the relationships among Drupal entities. In the next article, we’ll write the migrations for the three content types that are pending to import.

Image by JL G from Pixabay

Migrations How-To: #29

Migrating Your Data from D7 to D10: Migrating nodes - Part 1

Mauricio Dinarte

Senior Software Engineer | Drupal Migrations Expert

Entity ID and high water mark considerations

Filtering out records from being imported

Extending the node source plugin to filter out unpublished nodes

Migrating basic pages

Migrating articles

Understanding migration lookup operations

Understanding the relationships among Drupal entities

More Migration Resources

Performance testing with Gander

Popular content

Popular blogs