Migrating Your Data from D7 to D10: Paragraph migration. Creating custom process plugins.

Series Overview & ToC | Previous Article | Next Article

In the previous article, we started working on a node to user migration. Today, we expand on that example to accommodate more content model changes. First, we will learn how to migrate Drupal 7 field collections into Drupal 10 paragraphs. Then, we’ll populate an entity reference revision field attached to the user entity to add relationships to the newly migrated paragraph entities. Finally, we will learn how to create a custom process plugin to combine three separate Drupal 7 fields into a single multi-value Drupal 10 field.

Entity ID and high water mark considerations

Even though today's migration will create paragraph entities in Drupal 10, the source data is Drupal 7 field collections. Note that while the paragraph module existed in Drupal 7, it was not common for a site to use field collections and paragraphs in the same installation. If your project makes use of both and you need to migrate the data of the two modules, you will have to review more Drupal 7 tables to determine a suitable value for the AUTO_INCREMENT value of Drupal 11's paragraph entity.

Our example only makes use of the field collections. From the drupal7 folder, execute ddev mysql to open an interactive SQL shell. Then execute the following queries to help you determine which values to use for the AUTO_INCREMENT value:

-- Get the highest field collection id value.
SELECT item_id FROM field_collection_item ORDER BY item_id DESC LIMIT 1;

-- Get the highest field collection revision value.
SELECT revision_id FROM field_collection_item_revision ORDER BY revision_id DESC LIMIT 1;

For brevity, we are going to show how to configure the auto_increment_alter_content_entities setting of the AUTO_INCREMENT Alter module to apply the new AUTO_INCREMENT value. Refer to article 23 for more information on how this module works or how to perform the operation in custom code.

$settings['auto_increment_alter_content_entities'] = [
 'paragraph' => [500, 1500], // Alter the tables for the paragraph content entity.
];

Now, execute the command provided by the AUTO_INCREMENT Alter module to trigger the alter operation in the Drupal 10 project.

ddev drush auto-increment-alter:content-entities

As for the high water property, we need to identify one field returned by the source plugin whose value always increases. Paragraphs are revisionable entities. The automated upgrade path for paragraphs generates two content migrations for each field collection in the source site. One migrates the current revision and another migrates all past revisions. They use the d7_field_collection_item and d7_field_collection_item_revision source plugins respectively. Both source plugins retrieve data from the field_collection_item and field_collection_item_revision tables in Drupal 7.

Both migrations can use the revision_id column in the revision table as the high water mark as follows:

source:
 plugin: d7_field_collection_item
 high_water_property:
   name: revision_id
   alias: fr

Migrating field collections as paragraphs

The Paragraphs module offers an upgrade path for Drupal 7's Field collection module. Back in article 15, we explained how to migrate paragraph types out of field collections. As the name suggests, Drupal 7 field collections have fields attached to them. Adding those fields to the paragraphs entities was accounted for in the multiple field-related migrations: storage, instance, widget, and formatters. With the migration of configuration taken care of, we can now focus on migrating content.

Paragraphs are attached to entity bundles (like content types) via entity reference revision fields. Because of this, migrating paragraph content is a two step process. First, you need to migrate the paragraph entity data. Second, you need to migrate the entity reference revision data that connects the paragraph with its host entity. Before we start writing migrations, let's review our Drupal 7 configuration and devise an approach to follow based on our migration plan.

In Drupal 7, we have the field_favorite_quote field collection which has two fields attached to it: field_quote_name and field_quote_message. The field_favorite_quote field collection is used in the speaker content type. In Drupal 10, the field_favorite_quote field collection was migrated as the favorite_quote paragraph type with the field_quote_name and field_quote_message still attached to it. Our migration plan says that speaker nodes should be migrated as Drupal 10 user entities. Back in article 22, we attached a field_favorite_quote entity reference revisions field to users, which allows referencing favorite_quote paragraphs entities.

There are two tasks we still need to complete: migrate the paragraphs entities and update the user migration to populate the field_favorite_quote reference field.

Paragraphs are revisionable entities. As part of the upgrade path from field collections, the paragraph module provides two content migrations:

d7_field_collection.yml migrates the current paragraph revision.
d7_field_collection_revisions.yml migrates all past paragraph revisions.

Each of these migrations use a deriver that generates a migration file for each Drupal 7 field collection. This is similar to how node migrations work — a separate migration is generated for each content type. In our example, after running the automatted migration, we ended up with the upgrade_d7_field_collection_favorite_quote.yml and upgrade_d7_field_collection_revisions_favorite_quote.yml migrations in the ref_migrations folder. Our migration plan says we do not need to migrate revisions for any content entity. Therefore, we will only migrate the current paragraphs revisions.

We use upgrade_d7_field_collection_favorite_quote to migrate the current revision for the one field collection that existed in Drupal 7. Copy it from the reference folder into our tag1_migration custom module and rebuild caches for the migration to be detected.

cd drupal10
cp ref_migrations/migrate_plus.migration.upgrade_d7_field_collection_favorite_quote.yml web/modules/custom/tag1_migration/migrations/upgrade_d7_field_collection_favorite_quote.yml
ddev drush cache:rebuild

Note that while copying the file, we also changed its name and placed it in a migrations folder inside our tag1_migration custom module. After copying the file, make the following changes:

Remove the following keys: uuid, langcode, status, dependencies, field_plugin_method, cck_plugin_method, and migration_group.
Add two migration tags: paragraph and tag1_content.
Add key: migrate under the source section.
Add the high_water_property property as demonstrated above.
Remove the migration dependencies. They currently list configuration migration. Early on we decided that our content migration will not depend on configuration migrations.

After the modifications, the upgrade_d7_field_collection_favorite_quote.yml file should look like this:

id: upgrade_d7_field_collection_favorite_quote
class: Drupal\migrate\Plugin\Migration
migration_tags:
 - 'Drupal 7'
 - Content
 - 'Field Collection Content'
 - paragraph
 - tag1_content
label: 'Field Collections (Favorite quote)'
source:
 key: migrate
 plugin: d7_field_collection_item
 field_name: field_favorite_quote
 high_water_property:
   name: revision_id
   alias: fr
process:
 type:
   -
     plugin: get
     source: bundle
 parent_id:
   -
     plugin: get
     source: parent_id
 parent_type:
   -
     plugin: get
     source: parent_type
 parent_field_name:
   -
     plugin: get
     source: field_name
 field_quote_name:
   -
     plugin: get
     source: field_quote_name
 field_quote_message:
   -
     plugin: get
     source: field_quote_message
destination:
 plugin: 'entity_reference_revisions:paragraph'
 default_bundle: favorite_quote
migration_dependencies:
 required: {  }
 optional: {  }

The generated migration has the Field Collection Content tag. This is important and will be explained in the next section. For now, make sure to preserve it.

Now, rebuild caches for our changes to be detected and execute the migration. Run migrate:status to make sure we can connect to Drupal 7. Then, run migrate:import to perform the import operations.

ddev drush cache:rebuild
ddev drush migrate:status upgrade_d7_field_collection_favorite_quote
ddev drush migrate:import upgrade_d7_field_collection_favorite_quote

If things are properly configured, you should not get any errors. But where can you see the migrated paragraph entities? Out of the box, the paragraphs module does not provide a way to review their content entities from the user interface. Paragraphs are meant to be attached to a host entity via a reference field. It is by viewing the host entity that you can see the content of the paragraphs. The upgrade_d7_field_collection_favorite_quote migration generated content entities of the favorite_quote paragraph type. In our example, we will update the user migration to be able to see the migrated favorite_quote paragraphs referenced by the field_favorite_quote field.

In the meantime, there are other ways to make sure our paragraph migration created Drupal 10 content entities. One is to create a view that lists paragraph entities. This will be left as an exercise for the curious reader. Another is to run SQL queries against the Drupal 10 tables that store paragraph data. This is what we will do using a MySQL client.

From the drupal10 folder, execute ddev mysql to open an interactive SQL shell. Then execute the following queries:

-- Query paragraph entity data.
SELECT * FROM paragraphs_item;
SELECT * FROM paragraphs_item_field_data;

-- Query field data attached to paragraph entities.
SELECT * FROM paragraph__field_quote_name;
SELECT * FROM paragraph__field_quote_message;

Technical note: Even though we are not migrating past revisions, the entity API still generates data in the revisions table for the paragraph entity and the fields attached to it. This is how Drupal works out of the box. That is why it is important to provide AUTO_INCREMENT values for revisions when the content entity supports it, even if you have no plans to migrate revisions.

Connecting paragraphs to their host entities

After confirming the field collection to paragraph migration worked, we need to update the migration of the host entity. In particular, we need to populate the references in the user entity to the newly migrated paragraph entities. To accomplish this, we need to update the upgrade_d7_node_speaker_to_user migration created in the previous article.

Before showing how to migrate the relationship to the paragraph entities, I would like to acknowledge that our example is a rather simple one. Paragraphs migrations can get quite complex when you need to migrate revision, translations, and nested paragraph relationships. When coming from field collections in Drupal 7, the automated upgrade path provided by the paragraph module is quite flexible and offers tools to account for many different scenarios.

That said, it is valid to extend migration plugins provided by the paragraphs module or write custom ones altogether. It is our hope that throughout the series you have gained a deeper understanding of the Migrate API to plan and execute custom migrations.

The one strategy I would advise against is to create paragraph entities on the fly when migrating their host entities. That violates the principles of ETL and the created entities might end up in the site even after clean up operations, like migration rollbacks. Instead, create separate migrations for each entity/bundle combination and add the relationships among them following an approach similar to what we are about to describe.

Back to our example, the generated migration already contains a process pipeline we can use to establish the relationship between paragraphs and their host entities. But where? In Drupal 7, the speaker content type uses the field_favorite_quote field collection. So, take a look at the upgrade_d7_node_speaker migration in the ref_migrations folder. The relevant part is the mapping of the field_favorite_quote field in the process section. Copy the snippet below from the generated upgrade_d7_node_speaker migration into the upgrade_d7_node_speaker_to_user migration we created in the previous article:

process:
 field_favorite_quote:
   -
     plugin: sub_process
     source: field_favorite_quote
     process:
       target_id:
         -
           plugin: paragraphs_lookup
           tags: 'Field Collection Content'
           source: value
         -
           plugin: extract
           index:
             - id
       target_revision_id:
         -
           plugin: paragraphs_lookup
           tags:
             - 'Field Collection Revisions Content'
             - 'Field Collection Content'
           tag_ids:
             'Field Collection Revisions Content':
               - revision_id
             'Field Collection Content':
               - value
         -
           plugin: extract
           index:
             - revision_id

Now, rebuild caches for our changes to be detected. Then rollback the upgrade_d7_node_speaker_to_user migration and import it again.

ddev drush cache:rebuild
ddev drush migrate:status upgrade_d7_node_speaker_to_user
ddev drush migrate:rollback upgrade_d7_node_speaker_to_user
ddev drush migrate:import upgrade_d7_node_speaker_to_user

If things are properly configured, you should not get any errors. Go to https://migration-drupal10.ddev.site/admin/people?role=speaker and view or edit any of the migrated users with the Speaker role assigned to them. Clicking on the node for Frank will reveal a quote by Albert Einstein.

Wait, what just happened? Paragraph relationships use entity reference revision fields. To properly connect a paragraph field from the host entity to the paragraph entity, you need to *make sure to set values for the target_id and target_revision_id sub-fields.

In the snippet above, we are letting the automated upgrade path provided by the paragraph module do its job. It offers the paragraphs_lookup process plugin, which extends Drupal core's migration_lookup with extra functionality tailored to migrating data for entity reference revision fields. When paragraph entities are found, the process pipeline extracts their id and revision_id to populate the target_id and target_revision_id sub-fields of the field_favorite_quote field in the user entity. That is how the relationship between users and paragraphs is established.

Note the use of extra migration tags in the derived d7_field_collection and d7_field_collection_revisions migrations: Field Collection Content and Field Collection Revisions Content respectively. This limits the migrations that will be used in the lookup operation to those containing the listed tags. Feel free to review the ParagraphsLookup process plugin's code to better understand how the process pipeline is set up.

Migrate process plugin

Migrate process plugins are responsible for transforming source data into the format expected by the destination system. In our case, we are converting Drupal 7 data into a Drupal 10 suitable format. From a technical point of view, they leverage Drupal's Plugins API.

Below are some highlights regarding their implementation:

Migrate process plugins are classes in the Drupal\[module]\Plugin\migrate\process namespace and implement the MigrateProcessInterface interface.
The ProcessPluginBase base class is available for convenience. It implements common methods invoked for process plugins.
Most process plugins implement a transform method that takes care of the data manipulation operation. If such a method does not exist in the class, the process pipeline invoking the process plugin should specify a method key indicating which method in the process plugin’s class should be executed. An example of this is the skip_on_empty which can be called using the string row or process as the value of method configuration option.
If you need to inject a service, the plugin needs to implement the ContainerFactoryPluginInterface interface and its create method. Examples of this are the migration_lookup and machine_name process plugins.
For discovery, they use PHP attributes. Before version 10.2 Drupal core used annotations for discovery. Conversions to PHP attributes have started and the plan is to eventually deprecate annotations altogether.
A process plugin can signal that it supports handling multiple values by setting handle_multiples in its annotation. When set to TRUE, the plugin will expect an array as input and iterate over it, potentially changing the whole array. Examples of this are the sub_process and flatten process plugins. Conversely, a process plugin can signal that its return value requires multiple handling by returning TRUE from the multiple method of the plugin class. Examples of this are the sub_process and explode process plugins.
The plugin.manager.migrate.process service is the plugin manager responsible for process plugins. You can use it to get a list of available plugins based on enabled modules and obtain more details about their definition.

# List of migrate process plugin ids.
ddev drush php:eval "print_r(array_keys(\Drupal::service('plugin.manager.migrate.process')->getDefinitions()));"

# Details on a specific migrate process plugin id.
ddev drush php:eval "print_r(\Drupal::service('plugin.manager.migrate.process')->getDefinitions()['PROCESS_PLUGIN_ID']);"

Creating a custom process plugin

We are almost done with the speaker to user migration. The missing piece is migrating the field_drupal_org_profile, field_linkedin_profile, and field_x_twitter_profile URL fields in Drupal 7's speaker content type into a single Drupal 10 field_social_media_links social link field that accepts multiple values.

Back in article 16, we had a deep dive into how Drupal fields work — both from the perspective of PHP code and database tables. With regards to our task today, I explain how to combine multiple Drupal 7 URL fields into a single Drupal 10 social links field in this presentation. It covers a lot of ground from the technical side of things. I highly recommend revisiting article 16 and watching the video recording to have a better understanding of what we will do today.

In Drupal 7, the URL field has 3 sub-fields: value, title, attributes. In our example, the URL fields were configured to only store the URL value. As such, only the value sub-field is populated. An example value is: https://www.drupal.org/u/baltowen In Drupal 10, the social links field has 2 sub-fields: social and link. A single entry pointing to the same Drupal.org profile would store the values as drupal and baltowen, respectively.

We need to come up with a process pipeline that takes multiple URLs as stored in Drupal 7, breaks them into the platform/handle format used in Drupal 10, and returns the data to populate a multi-value social link field. A custom process plugin is ideal for this scenario.

Create a PHP file named Tag1SocialLinks.php inside our tag1_migration custom module's /src/Plugin/migrate/process folder. The path relative to the Drupal 10's project docroot is web/modules/custom/tag1_migration/src/Plugin/migrate/process/Tag1SocialLinks.php. The content of the file should be:


<?php

namespace Drupal\tag1_migration\Plugin\migrate\process;

use Drupal\migrate\Attribute\MigrateProcess;
use Drupal\migrate\MigrateExecutableInterface;
use Drupal\migrate\ProcessPluginBase;
use Drupal\migrate\Row;

#[MigrateProcess(
  id: "tag1_social_links",
  handle_multiples: TRUE,
)]
class Tag1SocialLinks extends ProcessPluginBase {

  /**
   * {@inheritdoc}
   */
  public function transform($value, MigrateExecutableInterface $migrate_executable, Row $row, $destination_property) {
    if (!is_array($value)) {
      $value = [$value];
    }

    $url_values = array_filter($value);

    $social_link_patterns = [
      'drupal' => '/^(http(s)?\:\/\/)?(www\.)?drupal\.org\/u\//',
      'twitter' => '/^(http(s)?\:\/\/)?(www\.)?(twitter|x)\.com\//',
      'linkedin' => '/^(http(s)?\:\/\/)?(www\.)?linkedin\.com\//',
    ];

    $result = [];
    foreach ($url_values as $url_value) {
      foreach ($social_link_patterns as $social_link_platform => $social_link_regex) {
        if (preg_match($social_link_regex, $url_value)) {
          $result[] = [
            'social' => $social_link_platform,
            'link' => preg_replace($social_link_regex, '', $url_value),
          ];
        }
      }
    }

    return $result;
  }

}

Then, update the upgrade_d7_node_speaker_to_user migration with the following snippet in the process section:

process:
 field_social_media_links:
   -
     plugin: tag1_social_links
     source:
       - field_drupal_org_profile/0/value
       - field_linkedin_profile/0/value
       - field_x_twitter_profile/0/value

Note that in our upgrade_d7_node_speaker_to_user migration, we are calling the tag1_social_links. This should match the id of the MigrateProcess PHP attribute used in our custom process plugin.

Now, rebuild caches for our new process plugin and the changes to the migration to take effect. Then rollback the upgrade_d7_node_speaker_to_user migration and import it again.

ddev drush cache:rebuild
ddev drush migrate:status upgrade_d7_node_speaker_to_user
ddev drush migrate:rollback upgrade_d7_node_speaker_to_user
ddev drush migrate:import upgrade_d7_node_speaker_to_user

A detailed review of the PHP code for the custom process plugin will be left as an exercise to the curious reader. What we want to highlight is the relationship between the data transformation logic and the data that is passed to the plugin itself. From our migration, we pass an array of 3 values: field_drupal_org_profile/0/value, field_linkedin_profile/0/value, and field_x_twitter_profile/0/value. Each of them attempts to retrieve the URL we are interested in processing by retrieving the value sub-field from the first delta. Do we need to extract the URL before sending data to the process plugin? Not necessarily. We could have extracted the URLs within the process plugin itself. This serves as a good reminder that when implementing custom migration logic, we are the ones who decide what each component is responsible for doing.

The above code will send an array with the following structure to our tag1_social_links custom process plugin:


[
 'https://drupal.org/u/baltowen',
 'https://www.linkedin.com/in/wendybaltodano',
 'https://x.com/baltowen',
]

After calling the transform method, our tag1_social_links custom process plugin will return an array with the following structure:


[
 0 => [
   'social' => 'drupal',
   'link' => 'baltowen',
 ],
 1 => [
   'social' => 'linkedin',
   'link' => 'in/wendybaltodano',
 ],
 2 => [
   'social' => 'twitter',
   'link' => 'baltowen',
 ],
]

This array will be stored in the field_social_media_links destination property. When the entity:node destination plugin calls the entity save operation, the field_social_media_links field on the user entity will be populated with profile information for the three social media platforms.

Writing custom process plugins can make migrations easier to read and maintain over time. That said, make sure you are familiar with plugins that are available to avoid reinventing the wheel. This documentation page includes a list of process plugins in Drupal core and some contributed modules that are commonly used in custom migrations. The list is not exhaustive as it would be impractical to list all plugins across all contributed modules. When in doubt about what is available ask the Drupal API with the snippet we shared in the Migrate process plugin section above.

Next time, we’ll learn to write custom source plugins. We will also walk you through populating regular entity revision fields; that is, fields that unlike paragraphs do not need to point to a specific revision of the referenced entity. All of this will be explained in the context of migrating a new entity type: media. Let's go!

Image by Manuel de la Fuente from Pixabay

Migrations How-to: #27

Migrating Your Data from D7 to D10: Paragraph migration. Creating custom process plugins.

Mauricio Dinarte

Senior Software Engineer | Drupal Migrations Expert

Entity ID and high water mark considerations

Migrating field collections as paragraphs

Connecting paragraphs to their host entities

Migrate process plugin

Creating a custom process plugin

More Migration Resources

Performance testing with Gander

Popular content

Popular blogs